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HUMAN GENOME- DERIVED SINGLE EXON NUCLEIC ACID PROBES USEFUL 
FOR ANALYSIS OF GENE EXPRESSION IN HUMAN BRAIN 



CROSS REFERENCE TO RELATED APPLICATIONS 

5 

The present application is a continuation-in-part of U.S. 
patent application serial nos . 09/632,366, filed August 3, 
2000 and 09/608,408, filed June 30, 2000; claims the 
benefit under 35 U.S.C. s 119(e) of U . S . provisional patent 

10 application serial nos. 60/236,359, filed September 27, 
2000, 60/234,687, filed September 21, 2000, 60/207,456, 
filed May 26, 2000, and 60/180,312, filed February 4, 2000; 
and further claims the benefit under 35 U.S.C. s 119(a) of 
UK patent application no. 0024263.6, filed October 4, 2000, 

15 the disclosures of which are incorporated herein by 
reference in their entireties. 

REFERENCE TO SEQUENCE LISTING AND INCORPORATION BY 
REFERENCE THEREOF 

20 

The present application includes a Sequence Listing in 
electronic format, filed pursuant to PCT Administrative 
Instructions 801 - 806 on a single CD-R disc, in 
triplicate, containing a file named pto_BRAIN.txt, created 
25 24 January 2001, having 25,840,972 bytes. The Sequence 

Listing contained in said file on said disc is incorporated 
herein by reference in its entirety. 

Field of the Invention 

30 

The present invention relates to genome-derived 
single exon microarrays useful for verifying the expression 
of regions of genomic DNA predicted to encode protein. In 
particular, the present invention relates to unique genome- 
35 derived single exon nucleic acid probes expressed in human 
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brain and single exon nucleic acid microarrays that include 
such probes. 



Background of the Invention 
5 For almost two decades following the invention of 

general techniques for nucleic acid sequencing, Sanger et 
al., Proc. Natl. Acad. Sci . USA 70 (4 ): 1209-13 (1973); 
Gilbert et al. t Proc. Natl. Acad. Sci. USA 70 (12) : 3581-4 
(1973) , these techniques were used principally as tools to 

10 further the understanding of proteins — known or 

suspected - about which a basic foundation of biological 
knowledge had already been built. In many cases, the 
cloning effort that preceded sequence identification had 
been both informed and directed by that antecedent 

15 biological understanding. 

For example, the cloning of the T cell receptor 
for antigen was predicated upon its known or suspected cell 
type-specific expression, by its suspected membrane 
association, and by the predicted assembly of its gene via 

20 T cell-specific somatic recombination. Subsequent 
sequencing efforts at once confirmed and extended 
understanding of this family of proteins. Hedrick et al., 
Nature 308 (5955) : 153-8 (1984). 

More recently, however, the development of high 

25 throughput sequencing methods and devices, in concert with 
large public and private undertakings to sequence the human 
and other genomes, has altered this investigational 
paradigm: today, sequence information often precedes 
understanding of the basic biology of the encoded protein 

30 product. 

One of the approaches to large-scale sequencing 
is predicated upon the proposition that expressed 
sequences — that is, those accessible through isolation of 
mRNA - are of greatest initial interest. This "expressed 
35 sequence tag" ("EST") approach has already yielded vast 
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amounts of sequence data {see for example Adams et al., 
Science 252:1651 (1991); Williamson, Drug Discov. Today 
4:115 (1999)). For nucleic acids sequenced by this 
approach, often the only biological information that is 
5 known a priori with any certainty is the likelihood of 
biologic expression itself. By virtue of the species and 
tissue from which the mRNA had originally been obtained, 
most such sequences are also annotated with the identity of 
'the species and at least one tissue in which expression 

10 appears likely. 

More recently, the pace of genomic sequencing has 
accelerated dramatically. When genomic DNA serves as the 
initial substrate for sequencing efforts, expression cannot 
be presumed; often the only a priori biological information 

15 about the sequence includes the species and chromosome (and 
perhaps chromosomal map location) of origin. 

With the ever-accelerating pace of sequence 
accumulation by directed, EST, and genomic sequencing 
approaches — and in particular, with the accumulation of 

20 sequence information from multiple genera, from multiple 

species within genera, and from multiple individuals within 
a species — there is an increasing need for methods that 
rapidly and effectively permit the functions of nucleic 
sequences to be elucidated. And as such functional 

25 information accumulates, there is a further need for 
methods of storing such functional information in 
meaningful and useful relationship to the sequence itself; 
that is, there is an increasing need for means and 
apparatus for annotating raw sequence data with known or 

30 predicted functional information. 

Although the increase in the pace of genomic 
sequencing is due in large part to technological changes in 
sequencing strategies and instrumentation, Service, Science 
280:995 (1998); Pennisi, Science 283: 1822-1823 (1999), 

35 there is an important functional motivation as well. 
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While it was understood that the EST approach 
would rarely be able to yield sequence information about 
the noncoding portions of the genome, it now also appears 
the EST approach is capable of capturing only a fraction of 
5 a genome's actual expression complexity. 

For example, when the C. elegans genome was fully 
sequenced, gene prediction algorithms identified over 
19,000 potential genes, of which only 7,000 had been found 
by EST sequencing. C. elegans Sequencing Consortium, 

10 Science 282:2012 (1998). Analogously, the recently 

completed sequence of chromosome 2 of Arabidopsis predicts 
over 4000 genes, Lin et al . , Nature, 402:761 (1999), of 
which only about 6% had previously" been identified via EST 
sequencing efforts. Although the human genome has the 

15 greatest depth of EST coverage, it is still woefully short 
of surrendering all of its genes. One recent estimate 
suggests that the human genome contains more than 146,000 
genes, which would at this point leave greater than half of 
the genes undiscovered. It is now predicted that many 

20 genes, perhaps 20 to 50%, will only be found by genomic 
sequencing. 

There is, therefore, a need for methods that 
permit the functional regions of genomic sequence — and 
most importantly, but not exclusively, regions that 

25 function to encode genes — to be identified. 

Much of the coding sequence of the human genome 
is not homologous to known genes, making detection of open 
reading frames ("ORFs") and predictions of gene function 
difficult. Computational methods exist for predicting 

30 coding regions in eukaryotic genomes. Gene prediction 
programs such as GRAIL and GRAIL II, Uberbacher et al., 
Proc. Natl. Acad. Sci. USA 88 (24 ): 11261-5 (1991); Xu et 
al., Genet. Eng. 16:241-53 (1994); Uberbacher et al., 
Methods Enzymol. 266:259-81 (1996); GENEFINDER, Solovyev et 

35 al., Nucl. Acids. Res. 22:5156-63 (1994); Solovyev et al., 

4 
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Ismb 5:294-302 (1997); and GENESCAN, Burge et al., J. Wo J . 
Biol. 268:78-94 (1997), predict many putative genes without 
known homology or function. Such programs are known, 
however, to give high false positive rates. Burset et al., 
5 Genomics 34:353-367 (1996). Using a consensus obtained by 
a plurality of such programs is known to increase the 
reliability of calling exons from genomic sequence. 
Ansari-Lari et al., Genome Res. 8(l):29-40 (1998) 

Identification of functional genes from genomic 

10 data remains, however, an imperfect art. For example, in 
reporting the full sequence of human chromosome 21, the 
Chromosome 21 Mapping and Sequencing Consortium reports 
that prior bioinf ormatic estimates of human gene number may 
need to be revised substantially downwards. Nature 

15 405:311-199 (2000); Reeves, Nature 405:283-284 (2000). 

Thus, there is a need for methods and apparatus 
that permit the functions of the regions identified 
bioinformatically — and specifically, that permit the 
expression of regions predicted to encode protein - readily 

20 to be confirmed experimentally. 

Recently, the development of nucleic acid 
microarrays has made possible the automated and highly 
parallel measurement of gene expression. Reviewed in 
Schena (ed. ) , DNA Microarrays : A Practical Approach 

25 (Practical Approach Series ), Oxford University Press (1999) 
(ISBN: 0199637768); Mature Genet. 21 (1 ) (suppl) : 1 - 60 
(1999); Schena (ed.), Microarray .Biochip : Tools and 
Technology , Eaton Publishing Company/BioTechniques Books 
Division (2000) (ISBN: 1881299376). 

30 It is common for microarrays to be derived from 

cDNA/EST libraries, either from those previously described 
in the literature, such as those from the I.M.A.G.E. 
consortium, Lennon et al., Genomics 33(1): 151-2 (1996), or 
from the construction of "problem specific" libraries 

35 targeted at a particular biological question, R.S. Thomas 

5 
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et al., Cancer Res. (in press). Such microarrays by 
definition can measure expression only of those genes found 
in EST libraries, and thus have not been useful as probes 
for genes discovered solely by genomic sequencing. 
5 The utility of using whole genome nucleic acid 

microarrays to answer certain biological questions has been 
demonstrated for the yeast Saccharomyces cerevisiae. De 
Risi et al., Science 278:680 (1997). The vast majority of 
yeast nuclear genes, approximately 95% however, are single 

10 exon genes, i.e., lack introns, Lopez et al., RNA 5:1135- 
1137 {1999); Goffeau et al. t Science 274:563-67 (1996), 
permitting coding regions more readily to be identified. 
Whole genome nucleic acid microarrays have not generally 
been used to probe gene expression from more complex 

15 eukaryotic genomes, and in particular from those averaging 
more than one intron per gene. 

Diseases of the brain and nervous system are a 
significant cause of human morbidity and mortality. 
Increasingly, genetic factors are being found that 

20 contribute to predisposition, onset, and/or aggressiveness 
of most, if not all, of these diseases. Although mutations 
in single genes have been identified as causative for some 
diseases of the brain and nervous system, for the most part 
these disorders are believed to have polygenic etiologies. 

25 There is a need for methods and apparatus that permit 
prediction, diagnosis and prognosis of diseases of the 
brain and nervous system particularly those diseases with 
polygenic etiology. 

30 Summary of the Invention 

The present invention solves these and other 
problems in the art by providing methods and apparatus for 
predicting, confirming, and displaying functional 
35 information derived from genomic sequence. The present 

6 
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invention also provides apparatus for verifying the 
expression of putative genes identified within genomic 
sequence . 

In particular, the invention provides novel 
5 genome-derived single exon nucleic acid microarrays useful 
for verifying the expression of putative genes identified 
within genomic sequence. 

The present invention also provides compositions 
and kits for the ready production of nucleic acids 
10 identical in sequence to, or substantially identical in 
sequence to, probes on the genome-derived single exon 
microarrays of the present invention. 

Accordingly, in a first aspect of the invention, 
there is provided a spatially-addressable set of single 
15 exon nucleic acid probes for measuring gene expression in a. 
sample derived from human brain, comprising a plurality of 
single exon nucleic acid probes according to any one of the 
nucleotide sequences set out in SEQ ID NOs: 1 - 12,821 or a 
complementary sequence, or a portion of such a sequence. 
20 By plurality is meant at least two, suitably at 

least 20, most suitably at least 100, preferably at least 
1000 and, most preferably, upto 5000. 

In one embodiment of the first aspect, each of 
said plurality of probes is separately and addressably 
25 amplifiable. 

In an alternative embodiment, each of said 
plurality of probes is separately and addressably 
isolatable from said plurality. 

In a preferred embodiment, each of said plurality 
30 of probes is amplifiable using at least one common primer. 
Preferably, each of said plurality of probes is amplifiable 
using a first and a second common primer. 

In yet another embodiment, said set of single 
exon nucleic acid probes comprises between 50 - 20,000 
35 probes, for example, 50 - 5000. 

7 
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Suitably, said set of single exon nucleic acid 
probes comprises at least 50 - 1000 discrete single exon 
nucleic acid probes having a sequence as set out in any of 
SEQ ID NOS. : 1 - 25,434 or a complimentary sequence, or a 
5 portion of such a sequence. 

Preferably, the average length of the single exon 
nucleic acid probes is between 200 and 500 bp. It is 
preferred that the average length should be at least 200bp, 
suitably at least 250bp, most suitably at least 300bp, 
10 preferably at least 400bp and, most preferably, 500 bp. 

In another embodiment, the single exon nucleic 
acid probes lack prokaryotic and bacteriophage vector 
sequence. It is preferred that at least 50%, suitably at 
least 60%, most suitably at least 70%, preferably at least 
15 75%, more preferably at least 80, 85, 90, 95 or 99% of said 
single exon nucleic acid probes lack prokaryotic and 
bacteriophage vector sequence. 

In another preferred embodiment, said single exon 
nucleic acid lack homopolymeric stretches of A or T. It is 
20 preferred that at least 50%, suitably at least 60%, most 
suitably at least 70%, preferably at least 75%, more 
preferably at least 80, 85, 90, 95 or 99% of said single 
exon nucleic acid probes lack homopolymeric stretches of A 
or T. 

25 Preferably, a spatially-addressable set of single 

exon nucleic acid probes in accordance with the first 
aspect of the invention is is addressably disposed upon a 
substrate. 

Suitable substrates include a filter membrane 
30 which may, preferably, be nitrocellulose or nylon. The 

nylon may preferably, be positively-charged. Other suitable 
substrates include glass, amorphous silicon, crystalline 
silicon, and plastic. Further suitable materials include 
polymethylacrylic, polyethylene, polypropylene, 
35 polyacrylate, polymethylmethacrylate, polyvinylchloride, 
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polytetraf luoroethylene, polystyrene, polycarbonate, 
polyacetal, polysulfone, celluloseacetate, 
cellulosenitrate, nitrocellulose, and mixtures thereof. 

In a second aspect of the invention, there is 
5 provided a microarray comprising a spatially addressable 
set of single exon nucleic acid probes in accordance with 
the first aspect of the invention. 

In one embodiment, a genome-derived single-exon 
microarray is packaged together with such an ordered set of 
10 amplifiable probes corresponding to the probes, or one or 
more subsets of probes, thereon. In alternative 
embodiments, the ordered set of amplifiable probes is 
• packaged separately from the genome-derived single exon 
microarray. 

15 In another aspect, the invention provides genome- 

derived single exon nucleic acid probes useful for gene 
expression analysis, and particularly for gene expression 
analysis by microarray. In particular embodiments of this 
aspect, the present invention provides human single-exon 
20 probes that include specif ically-hybridizable fragments of 
SEQ ID Nos. 12,822 - 25,434, wherein the fragment 
hybridizes at high stringency to an expressed, human gene. 
In particular embodiments, the invention provides single 
exon probes comprising SEQ ID Nos. 1 - 12,821. 

Accordingly, in a third aspect of the invention, 
there is provided a single exon nucleic acid probe for 
measuring human gene expression in a sample derived from 
human brain which is a nucleic acid molecule comprising a 
nucleotide sequence as set out in any of SEQ ID NOs . : 1 - 
12,821 or a complementary sequence or a fragment thereof 
wherein said probe hybridizes at high stringency to a 
nucleic acid expressed in the human brain. 

In one embodiment, a single exon nucleic acid 
probe in accordance with the third aspect comprises a 
nucleotide sequence as set out in any of SEQ ID NOs.: 
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12,822 - 25,434 or a complementary sequence or a fragment 
thereof . 

In a fourth aspect of the invention, there is 
provided a single exon nucleic acid probe for measuring 
5 human gene expression in a sample derived from human brain 
which is a nucleic acid molecule having a sequence encoding 
a peptide comprising a peptide sequence as set out in any 
of SEQ ID NOs.: 25,435 - 37,811or a complementary sequence 
or a fragment thereof wherein said probe hybridizes at high 

10 stringency to a nucleic acid expressed in the human brain. 

Preferably, a single exon nucleic acid probe in 
accordance with the third or fourth aspects of the 
invention comprises between at least 15 and 50 contiguous 
nucleotides of said SEQ ID NO: . It is preferred that the 

15 single exon nucleic acid probe comprises at least 15, 
suitably at least 20, more suitably at least 25 or 
preferably at least 50 contiguous nucleotides of said SEQ 
ID NO: . 

In another preferred embodiment, a single exon 
20 nucleic acid probe in accordance with the third or fourth 
aspects of the invention is between 3kb and 25kb in length. 
It is preferred that said probe is no more than 3kb, 
suitably no more than 5kb, more suitably no more than lOkb, 
preferably 15kb, more preferably 20kb or, most preferably, 
25 no more than 20kb in length. 

Preferably, a single exon nucleic acid probe in 
accordance with either the fifth or sixth aspect of the 
invention is DNA, preferably single-stranded DNA, RNA or 
PNA. 

30 In another embodiment of either the third or 

fourth aspect of the invention, a single exon nucleic acid 
probe is detectably labeled. Suitable detectable labels 
include a radionuclide, a fluorescent label or a first 
member of a specific binding pair. Suitable fluorescent' 

35 labels include dyes such as cyanine dyes, preferably Cy3 

10 
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and Cy5 although other suitable dyes will be known to those 
skilled in the art. 

In a particularly preferred embodiment, a single 
exon nucleic acid probe in accordance with either the third 
5 or fourth aspect of the invention lacks prokaryotic and 

bacteriophage vector sequence. In yet another embodiment, a 
single exon nucleic acid probe in accordance with either, 
the third or fourth aspect of the invention lacks 
homopolymeric stretches of A or T. 
10 In a fifth aspect of the invention, there is 

provided an amplifiable nucleic acid composition, 
comprising: 

the single exon nucleic acid probe in accordance 
with either of the third or ' fourth aspects of the 
15 invention; and at least one nucleic acid primer; 

wherein said at least one primer is sufficient to 
prime enzymatic amplification of said probe. 

In an sixth aspect of the invention, there is 
provided a method of measuring gene expression in a sample 
20 derived from human brain, comprising: 

contacting the single exon microarray in 
accordance with the second aspect of the invention, with a 
first collection of detectably labeled nucleic acids, said 
first collection of nucleic acids derived from mRNA of 
25 human brain; and then 

measuring the label detectably bound to each 
probe of said microarray. 

In a seventh aspect of the invention, there is 
provided a method of identifying exons in a eukaryotic 
30 genome, comprising: 

algorithmically predicting at least one exon from 
genomic sequence of said eukaryote; and then 

detecting specific hybridization of detectably 
labeled nucleic acids to a single exon probe, 
35 wherein said detectably labeled nucleic acids are 

11 
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derived from mRNA from the brain of said eukaryote, said 
probe is a single exon probe having a fragment identical in 
sequence to, or complementary in sequence to, said 
predicted exon, said probe is included within a single exon 
5 microarray in accordance with the first aspect of the 

invention, and said fragment is selectively hybridizable at 
high stringency. • 

In a eighth aspect of the invention, there is 
provided a method of assigning exons to a single gene, 
10 comprising: 

identifying a plurality of exons from genomic 
sequence in accordance with the seventh aspect of the 
invention; and then 

measuring the expression of each of said exons in 
15 a plurality of tissues and/or cell types using 

hybridization to single exon microarrays having a probe 
with said exon, 

wherein a common pattern of expression of said 
exons in said plurality of tissues and/or cell types 
20 indicates that the exons should be assigned to a single 
gene . 

In an ninth aspect of the invention, there is 
provided a nucleic acid sequence as set out in any of SEQ 
ID NOs: 1 - 25,434 wherein said sequence encodes a peptide. 
25 In a tenth aspect of the invention, there is 

provided a peptide encoded by a sequence comprising a 
sequence as set out in any- of SEQ ID NOs : 12,822 - 25,434, 
or a complementary sequence or coding portion thereof. 

In a preferred embodiment, a peptide may be 
30 encoded by a sequence comprising a sequence set out in any 
of SEQ ID NOS.: 1 -12,821. 

In a further aspect, the invention provides 
peptides comprising an amino acid sequence translated from 
the DNA fragments, said amino acid sequences comprising SEQ 
35 ID NOS. : 25, 435 - 37,811. 



12 
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Accordingly in a eleventh aspect of the invention 
there is provided a peptide comprising a sequence as set 
out in any of SEQ ID NOs : 25,435 - 37,811, or fragment 
thereof. 

5 In another aspect, the invention provides means 

for displaying annotated sequence, and in particular, for 
displaying sequence annotated according to the methods and 
apparatus of the present invention. Further, such display 
can be used as a preferred graphical user interface for 
10 electronic search, query, and analysis of such annotated 
sequence. 



Detailed Description of the Invention 

15 

Definitions 

As used herein, the term "microarray" and phrase 
"nucleic acid microarray" refer to a substrate-bound 
collection of plural nucleic acids, hybridization to each 

20 of the plurality of bound nucleic acids being separately 
detectable. The substrate can be solid or porous, planar 
or non-planar, unitary or distributed. 

As so defined, the term "microarray" and phrase 
"nucleic acid microarray" include all the devices so called 

25 in Schena (ed. ) , DNA Microarrays: A Practical Approach 

(Practical Approach Series ), Oxford University Press (1999) 
(ISBN: 0199637768); Nature Genet. 21 (1) (suppl) : 1 - 60 
(1999); and Schena (ed.),. Microarray Biochip: Tools and 
Technology , Eaton Publishing Company/BioTechniques Books 

30 Division (2000) (ISBN: 1881299376). As so defined, the 
term "microarray" and phrase "nucleic acid microarray" 
further include substrate-bound collections of plural 
nucleic acids in which the nucleic acids are distributably 
disposed on a plurality of beads, rather than on a unitary 

35 planar substrate, as is described, inter alia, in Brenner 

13 
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et al., Proc. Natl. Acad. Sci . USA 97 ( 4 ): 166501670 (2000); 
in such case, the term "microarray" and phrase "nucleic 
acid microarray" refer to the plurality of beads in 
aggregate. 

5 As used herein with respect to a nucleic acid 

microarray, the term "probe" refers to the nucleic acid 
that is, or is intended to be, bound to the substrate; in 
such context, the term "target" thus refers to nucleic acid 
intended to be bound thereto by Watson-Crick 

10 complementarity. "As used herein with respect to solution 
phase hybridization, the term "probe" refers to the nucleic 
acid of known sequence that is detectably labeled. 

As used herein, the expression "probe comprising 
SEQ ID NO.", and variants thereof, intends a nucleic acid 

15 probe, at least a portion of which probe has either (i) the 
sequence directly as given in the referenced SEQ ID NO. , or 
(ii) a sequence complementary to the sequence as given in 
the referenced SEQ ID NO., the choice as between sequence 
directly as given and complement thereof dictated by the 

20 requirement that the probe hybridize to mRNA. 

As used herein, the term "open reading frame" and 
the equivalent acronym "ORF" refer to that portion of an 
exon that can be translated in its entirety into a sequence 
of contiguous amino acids i.e. a nucleic acid sequence 

25 that, in at least one reading frame, does not possess stop 
codons; the term does not require that the ORF encode the 
entirety of a natural protein. 

As used herein, the term "amplicon" refers to a 
PCR product amplified from human genomic DNA, containing 

30 the predicted exon. 

As used herein the term "exon" refers to the 
consensus prediction of the various exon and gene 
predicting algorithms i.e. a nucleic acid sequence 
bioinformatically predicted to encode a portion of a 

35 natural protein. 

14 
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As used herein, the term "peptide" refers to a 
sequence of amino acids. The sequences referred to as 
PEPTIDE SEQ ID NOS . : are the predicted peptide sequences 
that would be translated from one of the exons, or a 
5 portion thereof set out in exon SEQ ID NOS.:. The codons 
encoding the peptide are wholly contained within the exon. 

As used herein, a "portions" of a defined 
nucleotide sequence or sequences can be and, preferably, 
are fragments unique to that sequence or to one or a 

10 combination of those sequences. A fragment unique to a 
nucleic acid molecule is one that is a signature for the 
larger nucleic acid molecule. 

As used herein, the phrase "expression of a 
probe" and its linguistic variants means that the ORF 

15 present within the probe, or its complement, is present 
within a target mRNA. 

As used herein, "stringent conditions" refers to 
parameters well known to those skilled in the art. When a 
nucleic acid molecule is said to be hybridisable to another 

20 of a given sequence under "stringent conditions" it is 
meant that it is homologous to the given sequence. 

As used herein, the phrase "specific binding 
pair" intends a pair of molecules that bind to one another 
with high specificity. Binding pairs are said to exhibit 

25 specific binding when they exhibit avidity of at least 10 7 , 
preferably at least 10 8 , more preferably at least 10 9 
liters/mole. Nonlimiting examples of specific binding 
pairs are: antibody and antigen; biotin and avidin; and 
biotin and streptavidin. 

30 As used herein with respect to the visual display 

of annotated genomic sequence, the term "rectangle" means 
any geometric shape that has at least a first and a second 
border, wherein the first and second borders each are 
capable of mapping uniquely to a point of another visual 

35 object of the display. 
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As used herein, a "Mondrian" means a visual 
display in which a single genomic sequence is annotated 
with predicted and experimentally confirmed functional 
information . 

5 

Brief Description of the Drawings 

The present invention is further illustrated with 
10 reference to the following non-limiting figures and 
examples in which: 

FIG. 1 illustrates a process for predicting 
functional regions from genomic sequence, confirming the 
functional activity of such regions experimentally, and 
15 associating and displaying the data so obtained in 

meaningful and useful relationship to the original sequence 
data; 

FIG. 2 further elaborates that portion of the 
process schematized in FIG. 1 for predicting functional 
20 regions from genomic sequence; 

FIG. 3 illustrates a Mondrian visual display; 

FIG. 4 presents a Mondrian showing a hypothetical 
annotated genomic sequence; 

FIG. 5 is a histogram showing the distribution of 
25 ORF length and PCR products as obtained, with ORF length 
shown in black and PCR product length shown in dotted 
lines; 

FIG. 6 is a histogram showing the distribution, 
among exons predicted according to the methods described, 

30 of expression as measured using simultaneous two color 
hybridization to a genome-derived single exon microarray. 
The graph shows the number of sequence-verified products 
that were either not expressed ("0"), expressed in one or 
more but not all tested tissues ("1" - "9")/ or expressed . 

35 in all tissues tested ("10"); 
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FIG. 7 is a pictorial representation of the 
expression of verified sequences that showed expression 
with signal intensity greater than 3 in at least one 
tissue, with: FIG. 7A showing the expression as measured by 
5 microarray hybridization in each of the 10 measured 

tissues, and the expression as measured "bioinf ormatically" 
by query of EST, NR and SwissProt databases; with FIG. 7B 
showing the legend for display of physical expression 
(ratio) in FIG. 7A; and with FIG. 7C showing the legend for 

10 scoring EST hits as depicted in FIG. 7A; 

FIG. 8 shows a comparison of normalized CY3 
signal intensity for arrayed sequences that were identical 
to sequences in existing EST, NR and SwissProt databases or 
that were dissimilar (unknown) , where black denotes the 

15 signal intensity for all sequence-verified products with a 
BLAST Expect ("E") value of greater than le-30 (1 x 10" 30 ) 
("unknown") and a dotted line denotes sequence-verified 
spots with a BLAST expect ("E" ) value of less than le-30 (1 
x. 10" 30 ) ("known") ; 

20 FIG. 9 presents a Mondrian of BAC AC008172 (bases 

25,000 to 130,000), containing the carbamyl phosphate 
synthetase gene (AF154830 . 1) ; and 

FIG. 10 is a Mondrian of BAC A049839. 



25 

Methods and Apparatus for Predicting, Confirming, 
Annotating, and Displaying Functional Regions From Genomic 
Sequence Data 

30 FIG. 1 is a flow chart illustrating in broad 

outline a process for predicting functional regions from 
genomic sequence, confirming and characterizing the 
functional activity of such regions experimentally, and 
then associating and displaying the information so obtained 

35 in meaningful and useful relationship to the original 
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sequence data. 

The initial input into process 10 of the present 
invention is drawn from one or more databases 100 
containing genomic sequence data. Because genomic sequence 
5 is usually obtained from subgenomic fragments, the sequence 
data typically will be stored in a series of records 
corresponding to these subgenomic sequenced fragments. 
Some fragments will have been catenated to form larger 
contiguous sequences ( "contigs " ) ; others will not. A 

10 finite percentage of sequence data in the database will 
typically be erroneous, consisting inter alia of vector 
sequence, sequence created from aberrant cloning events, 
sequence of artificial polylinkers, and sequence that was 
erroneously read. 

15 Each sequence record in database 100 will 

minimally contain as annotation a unique sequence 
identifier (accession number) , and will typically be 
annotated further to identify the date of accession, 
species of origin, and depositor. Because database 100 can 

20 contain nongenomic sequence, each sequence will typically 
be annotated further to permit query for genomic sequence. 
Chromosomal origin, optionally with map location, can also 
be present. Data can be, and over time increasingly will 
be, further annotated with additional information, in part 

25 through use of the present invention, as described below. 
Annotation can be present within the data records, in 
information external to database 100 and linked to the 
records thereto, or through a combination of the two. 

Databases useful as genomic sequence database 100 

30 in the present invention include GenBank, and particularly 
include several divisions thereof, including the 
htgs (draft), NT (nucleotide, command line), and NR 
(nonredundant) divisions. GenBank is produced by the 
National Institutes of Health and is maintained by the 

35 National Center for Biotechnology Information (NCBI) . 
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Databases of genomic sequence from species other than 
human, such as mouse, rat, Arabidopsis, C. elegans, C. 
brigsii, Drosophila, zebra fish, and other higher 
eukaryotic organisms will also prove useful as genomic 
5 sequence database 100. 

Genomic sequence obtained by query of genomic 
sequence database 100 is then input into one or more 
processes 200 for identification of regions therein that 
are predicted to have a biological function as specified by 

10 the user. Such functions include, but are not limited to, 
encoding protein, regulating transcription, regulating 
message transport after transcription into mRNA, regulating 
message splicing after transcription into mRNA, of 
regulating message degradation after transcription into 

15 mRNA, and the like. Other functions include directing 
somatic recombination events, contributing to chromosomal 
stability or movement, contributing to allelic exclusion or 
X chromosome inactivation, and the like. 

The particular genomic sequence to be input into 

20 process 200 will depend upon the function for which 

relevant sequence is to be identified as well as upon the 
approach chosen for such identification. Process step 200 
can be iterated to identify different functions within a 
given genomic region. In such case, the input often will 

25 be .different for the several iterations. 

Sequences predicted to have the requisite 
function by process 200 are then input into process 300, 
where a subset of the input sequences suitable for 
experimental confirmation is identified. Experimental 

30 confirmation can involve physical and/or bioinf ormatic 
assay. Where the subsequent experimental assay is 
bioinformatic, rather than physical, there are fewer 
constraints on the sequences that can be tested, and in 
this latter case therefore process 300 can output the 

35 entirety of the input sequence. 
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The subset of sequences output from process 300 
is 'then used in process 400 for experimental verification 
and characterization of the function predicted in 
process 200, which experimental verification can, and often 
5 will, include both physical and bioinf ormatic assay. 

Process 500 annotates the sequence data with the 
functional information obtained in the physical and/or 
bioinformatic assays of process 400. Such annotation can 
be done using any technique that usefully relates the 

10 functional information to the sequence, as, for example, by 
incorporating the functional data into the sequence data 
record itself, by linking records in a hierarchical or 
relational database, by linking to external databases, by a 
combination thereof, or by other means well known within 

15 the database arts. The data can even be submitted for 

incorporation into databases maintained by others, such as 
GenBank, which is maintained by NCBI. 

As further noted in FIG. 1, additional annotation 
can be input into process 500 from external sources 600. 

20 The annotated data is then displayed in process 

800, either before, concomitantly with, or after optional 
storage 700 on nontransient media, such as magnetic disk, 
optical disc, magnetooptical disk, flash memory, or the 
like. 

25 FIG. 1 shows that the experimental data output 

• from process 400 can be used in each preceding step of 
process 10: e.g., facilitating identification of functional 
sequences in process 200, facilitating identification of an 
experimentally suitable subset thereof in process 300, and 

30 facilitating creation of physical and/or informational 
substrates for, and performance of subsequent assay, of 
functional sequences in process 400. 

Information from each step can be passed directly 
to the succeeding process, or stored in permanent or 

35 interim form prior to passage to the succeeding process. 
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Often, data will be stored after each, or at least a 
plurality, of such process steps. Any or all process steps 
can be automated. 

FIG. 2 further elaborates the prediction of 
5 functional sequence within genomic sequence according to 
process 200. 

Genomic sequence database 100 is first queried 20 
for genomic sequence. 

The sequence required to be returned by query 20 
10 will depend, in the first instance, upon the function to be 
identified. 

For example, genomic sequences that function to 
encode protein can be identified inter alia using gene 
prediction approaches, comparative sequence analysis 

15 approaches, or combinations of the two. In gene prediction 
analysis, sequence from one genome is input into process 
200 where at least one, preferably a plurality, of 
algorithmic methods are applied to identify putative coding 
regions. In comparative sequence analysis, by contrast, 

20 corresponding, e.g., syntenic, sequence from a plurality of 
sources, typically a plurality of species, is input into 
process 200, where at least one, possibly a plurality, of 
algorithmic methods are applied to compare the sequences 
and identify regions of least variability. 

25 The exact content of query 20 will also depend 

upon the database queried. For example, if the database 
contains both genomic and nongenomic sequence, perhaps 
derived from multiple species, and the function to be 
determined is protein coding regions in human genomic 

30 sequence, the query will accordingly require that the 
sequence returned be genomic and derived from humans . 

Query 20 can also incorporate criteria that 
compel return of sequence that meets operative requirements 
of the subsequent analytical method. Alternatively, or in 

35 addition, such operative criteria can be enforced in 
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subsequent preprocess step 24. 

For example, if the function sought to be 
identified is protein coding, query 20 can incorporate 
criteria that return from genomic sequence database 100 
5 only those sequences present within contigs sufficiently 
long as to have obviated substantial fragmentation of any 
given exon among a plurality of separate sequence 
fragments. 

Such criteria can, for example, consist of a 

10 required minimal individual genomic sequence fragment 

length, such as 10 kb, more typically 20 kb, 30 kb, 40kb, 
and preferably 50 kb or more, as well as an optional 
further or alternative requirement that sequence from any 
given clone, such as a bacterial artificial chromosome 

15 ("BAC"), be presented in no more than a finite maximal 
number of fragments, such as no more than 20 separate 
pieces, more typically no more than 15 fragments, even more 
typically no more than about 10 - 12 fragments. 

Results using the present invention have shown 

20 that genomic sequence from bacterial artificial chromosomes 
(BACs) is sufficient for gene prediction analysis according 
to the present invention if the sequence is at least 50 kb 
in length, and if additionally the sequence from any given 
BAC is presented in fewer than 15, and preferably fewer 

25 than 10, fragments. Accordingly, query 20 can incorporate 
a requirement that data accessioned from BAC sequencing be 
in fewer than 15, preferably fewer than 10, fragments. 

An additional criterion that can be incorporated 
into the query can be the date, or range of dates, of 

30 sequence accession. Although the process has been 

described above as if genomic sequence database 100 were 
static, it is of course understood that the genomic 
sequence databases need not be static, and indeed are 
typically updated on a frequent, even hourly, basis. Thus, 

35 as further described in Examples 1 and 2, infra, it is 
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possible to query the database for newly added sequence, 
either newly added after an absolute date, or newly added 
relative to a prior analysis performed using the methods 
and apparatus of the present invention. In this way, the 
5 process herein described can incorporate a dynamic, 
temporal component . 

One utility of such temporal limitation is to 
identify, from newly accessioned genomic sequence, the 
presence of novel genes, particularly those not previously 

10 identified by EST sequencing (or other sequencing efforts 
that are similarly based upon gene expression) . As further 
described in Example 1, such an approach has shown that 
newly accessioned human genomic sequence, when analyzed for 
sequences that function to encode protein, readily 

15 identifies genes that are novel over those in existing EST 
and other expression databases. This makes the methods of 
the present invention extremely powerful gene discovery 
tools. And as would be appreciated, such gene discovery 
can be performed using genomic sequence from species other 

20 than human. 

If query 20 incorporates multiple criteria, such 
as above-described, the multiple criteria can be performed 
as a series of separate queries or as a single query, 
depending in part upon the query language, the complexity 

25 of the query, and other considerations well known in the 
database arts. 

If query 20 returns no genomic sequence meeting 
the query criteria, the negative result can be reported by 
process 22, and process 200 (and indeed, entire process 10) 

30 ended 23, as shown. Alternatively, or in addition to 

report and termination of the initial inquiry, a new query 
20 can be generated that takes into account the initial 
negative result. 

When query 20 returns sequence meeting the query 

35 criteria, the returned sequence is then passed to optional 
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preprocessing 24, suitable and specific for the desired 
analytical approach and the particular analytical methods 
thereof to be used in process 25. 

Preprocessing 24 can include processes suitable 
5 for many approaches and methods thereof, as well as 

processes specifically suited for the intended subsequent 
analysis. 

Preprocessing 24 suitable for most approaches and 
methods will include elimination of sequence irrelevant to, 

10 or that would interfere with, the subsequent analysis. 
Such sequence includes repetitive sequence, such as Alu 
repeats and LINE elements, vector sequence, artificial 
sequence, such as artificial polylinkers, and the like. 
Such removal can readily be performed by identification and 

15 subsequent masking of the undesired sequence. 

Identification can be effected by comparing the 
genomic sequence returned by query 20 with public or 
private databases containing known repetitive sequence, 
vector sequence, artificial sequence, and other artifactual 

20 sequence. Such comparison can readily be done using 

programs well known in the art, such as CROSS_MATCH, or by 
proprietary sequence comparison programs the engineering of 
which is well within the skill in the art. 

Alternatively, or in addition, undesirable, 

25 including artifactual, sequence can be identified 

algorithmically without comparison to external databases 
and thereafter removed. For example, synthetic polylinker 
sequence can be identified by an algorithm that identifies 
a significantly higher than average density of known 

30 restriction sites. As another example, vector sequence can 
be identified by algorithms that identify nucleotide or 
codon usage at variance with that of the bulk of the 
genomic sequence. 

Once identified, undesired sequence can be 

35 removed. Removal can usefully be done by masking the 
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undesired sequence as, for example, by converting the 
specific nucleotide references to one that is unrecognized 
by the subsequent bioin forma tic algorithms, such as "X" . 
Alternatively, but at present less preferred, the undesired 
5 sequence can be excised from the returned genomic sequence, 
leaving gaps. 

Preprocessing 24 can further include selection 
from among duplicative sequences of that one sequence of 
highest quality. Higher quality can be measured as a lower 

10 percentage of, fewest number of, or least densely clustered 
occurrence of ambiguous nucleotides, defined as those 
nucleotides that are identified in the genomic sequence 
using symbols indicating ambiguity. Higher quality can 
also or alternatively be valued by presence in the longest 

15 contig. 

Preprocessing 24 can, and often will, also 
include formatting of the data as specifically appropriate 
for passage to the analytical algorithms of process 25. 
Such formatting can and typically will include, inter alia, 

20 addition of a unique sequence identifier, either derived 
from the original accession number in genomic sequence 
database 100, or newly applied, and can further include 
additional annotation. Formatting can include conversion 
from one to another sequence listing standard, such as 

25 conversion to or from FASTA or the like, depending upon the 
input expected by the subsequent process. 

Preprocessing, which can be optional depending 
upon the function desired to be identified and the 
informational requirements, of the methods for effecting 

30 such identification, is followed by sequence processing 25, 
where sequences with the desired function are identified 
within the genomic sequence. 

As mentioned above, such functions can include, 
but are not limited to, encoding protein, regulating 

35 transcription, regulating message transport after 
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transcription into mRNA, regulating message splicing after 
transcription, of regulating message degradation, and the 
like. Other functions include directing somatic 
recombination events, contributing to chromosomal stability 
5 or movement, contributing to allelic exclusion or X 
chromosome inactivation, or the like. 

The methods of the present invention are 
particularly useful for gene discovery, that is, for 
identifying, from genomic sequence, regions that function 

10 to encode genes, and in a particularly useful embodiment, 
for identifying regions that function to encode genes not 
hitherto identified by expression-based or directed cloning 
and sequencing. In conjunction with verification using the 
novel single exon microarrays of the present invention, as 

15 further described below, the methods herein described 
become powerful gene discovery tools. 

Accordingly, in a preferred embodiment of the 
present invention, process 25 is used to identify putative 
coding regions. Two preferred approaches in process 25 for 

20 identifying sequence that encodes putative genes are gene 
prediction and comparative sequence analysis. 

Gene prediction can be performed using any of a 
number of algorithmic methods, embodied in one or more 
software programs, that identify open reading frames (ORFs) 

25 using a variety of heuristics, such as GRAIL, DICTION, and 
GENEFINDER. Comparative sequence analysis similarly can be 
performed using any of a variety of known programs that 
identify regions with lower sequence variability. 

As further described in Example 1, below, gene 

30 finding software programs yield a range of results. For 
the newly accessioned human genomic sequence input in 
Example 1, for example, GRAIL identified the greatest 
percentage of genomic sequence as putative coding region, 
2% of the data analyzed; GENEFINDER was second, calling 1%; 

35 and DICTION yielded the least putative coding region, with 
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0.8% of genomic sequence called as coding region. 

Increased reliability can be obtained when 
consensus is required among several such methods. Although 
discussed herein particularly with respect to exon calling, 
5 consensus among methods will in general increase 
reliability of predicting other functions as well. 

Thus, as indicated by query 26, sequence 
processing 25, optionally with preprocessing 24, can be 
repeated with a different method, with consensus among such 
10 iterations determined and reported in process 27. 

Process 27 compares the several outputs for a 
given input genomic sequence and identifies consensus among 
the separately reported results. The consensus itself, as 
well as the sequence meeting that consensus, is then stored 
15 in process 29a, displayed in process 29b, and/or output to 
process 300 for subsequent identification of a subset 
thereof suitable for assay. 

Multiple levels of consensus can be calculated 
and reported by process 27. For example, as further 
20 described in Example 1, infra, process 27 can report 

consensus as between all specific pairs of methods of gene 
prediction, as consensus among any one or more of the pairs 
of methods of gene prediction, or as among all of the gene 
prediction algorithms used. Thus, in Example 1, process 27 
25 reported that GRAIL and GENEFINDER programs agreed on 0.7% 
of genomic sequence, that GRAIL and DICTION agreed on 0.5% 
of genomic sequence, and that the three programs together 
agreed on 0.25% of the data analyzed. Put another way, 
0.25% of the genomic sequence was identified by all three 
30 of the programs as containing putative coding region. 

Furthermore, consensus can be required among 
different approaches to identifying a chosen function. 

For example, if the function desired to be 
identified is coding of protein sequence, and a first used 
35 approach to exon calling is gene prediction, the process 
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can be repeated on the same input sequence, or subset 
thereof, with another approach, such as comparative 
sequence analysis. In such a case, where comparative 
sequence analysis follows gene prediction, the comparison 
5 can be performed not only on genomic nucleic acid sequence, 
but additionally or alternatively can be performed on the 
predicted amino acid sequence translated from the ORFs 
prior identified by the gene prediction approach. 

Although shown as an iterative process, the 

10 multiple analyses required to achieve consensus can be done 
in series, in parallel, or some combination thereof. 

Predicted functional sequence, optionally 
representing a consensus among a plurality of methods and 
approaches for determination thereof, is passed to process 

15 300 for identification of a subset thereof for functional 
assay. 

In the preferred embodiment of the methods of the 
present invention, wherein the function sought to be 
identified is protein coding, process 300 is used to 

20 identify a subset thereof suitable for experimental 

verification by physical and/or bioinf ormatic approaches. 

For example, putative ORFs identified in process 
200 can be classified, or binned, bioinf ormatically into 
putative genes. This binning can be based inter alia upon 

25 consideration of the average number of exons/gene in the 
species chosen for analysis, upon density of exons that 
have been called on the genomic sequence, and other 
empirical rules. Thereafter, one or more among the gene- 
specific ORFs can be chosen for subsequent use in gene 

30 expression assay. 

Where such subsequent gene expression assay uses 
amplified nucleic acid, considerations such as desired 
amplicon length, primer synthesis requirements, putative 
exon length, sequence GC content, existence of possible 

35 secondary structure, and the like can be used to identify 
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and select those ORFs that appear most likely successfully 
to amplify. Where subsequent gene expression assay relies 
upon nucleic acid hybridization, whether or not using 
amplified product, further considerations involving 
5 hybridization stringency can be applied to identify that 
subset of sequences that will most readily permit sequence- 
specific discrimination at a chosen hybridization and wash 
stringency. One particular such consideration is avoidance 
of putative exons that span repetitive sequence; such 

10 sequence can hybridize spuriously to nonspecific message, 
reducing specific signal in the hybridization. 

For bioinformatic assay, there are fewer 
constraints on the sequences that can be tested 
experimentally, and in this latter case therefore process 

15 300 can output the entirety of the input sequence. 

The subset of sequences identified by process 300 
as suitable for use in assay is then used in process 400 to 
create the physical and/or informational substrate for 
experimental verification of the predictions made in 

20 process 200, and thereafter to assay those substrates. 

As mentioned, the methods of the present 
invention are particularly useful for identifying potential 
coding regions within genomic sequence. In a preferred 
embodiment of process 400, therefore, the expression of the 

25 sequences predicted to encode protein is verified. The 
combination of the predictive and experimental methods 
provides a powerful gene discovery engine. 

Thus, in another aspect, the present invention 
provides methods and apparatus for verifying the expression 

30 of putative genes identified within genomic sequence. In 
particular, the invention provides a novel method of 
verifying gene expression in which expression of predicted 
ORFs is measured and confirmed using a novel type of 
nucleic acid microarray, the genome-derived single exon 

35 nucleic acid microarrays of the present invention. 
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Putative ORFs as predicted by a consensus of gene 
calling, particularly gene prediction, algorithms in 
process 200, and as further identified as suitable by 
process 300, are amplified from genomic DNA using the 
5 polymerase chain reaction (PCR) . Although PCR is 

conveniently used, other amplification approaches can also 
be used. 

Amplification schemes can be designed to capture 
the entirety of each predicted ORF in an amplicon with 
10 minimal additional (that is, intronic or intergenic) 
sequence. Because ORFs predicted from human genomic 
sequence using the methods of the present invention differ 
in length, such an approach results in amplicons of varying 
length. 

15 However, most predicted ORFs are shorter than 500 

bp in length, and although amplicons of at least about 100 
or 200 base pairs can be immobilized as probes on nucleic 
acid microarrays, early experimental results using the 
methods of the present invention have suggested that longer 

20 amplicons, at least about 400 or 500 base pairs, are more 
effective. Furthermore, certain advantages derive from 
application to the microarray of amplicons of defined size. 

Therefore, amplification schemes can 
alternatively, and preferably, be designed to amplify 

25 regions of defined size, preferably at least about 300, 400 
or 500 bp, centered about each predicted ORF. Such an 
approach results in a population of amplicons of limited 
size diversity, but that typically contain intronic and/or 
intergenic nucleic acid in addition to putative ORF. 

30 Conversely, somewhat fewer than 10% of ORFs 

predicted from human genomic sequence according to the 
methods of the present invention exceed 500 bp in length. 
Portions of such extended ORFs, preferably at least about 
300,400 or 500 bp in length, can be amplified. However, it 

35 has been discovered that the percentage success at 
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amplifying pieces of such ORFs is low, and that such 
putative exons are more effectively amplified when larger 
fragments, at least about 1000 or 1500 bp, and even as 
large as 2000 bp are amplified." , 
5 The putative ORFs selected in process 300 are 

thus input into one or more primer design programs, such as 
PRIMER3 (available online for use at 

http://www-genome.wi.mit.edu/cgi-bin/primer/ ), with a goal 
of amplifying at least about 500 base pairs of genomic 

10 sequence centered within or about ORFs predicted to be no 
more than about 500 bp, or at least about 1000 - 1500 bp of 
genomic sequence for ORFs predicted to exceed 500 bp in 
length, and the primers synthesized by standard techniques. 
Primers with the requisite sequences can be purchased 

15 commercially or synthesized by standard techniques. 

Conveniently, a first predetermined sequence can 
be added commonly to the ORF-specific 5 1 primer and a 
second,, typically different, predetermined sequence 
commonly added to each 3' ORF-unique primer. This serves 

20 to immortalize the amplicon, that is, serves to permit 

further amplification of any amplicon using a single set of 
primers complementary respectively to the common 5' and 
common 3' sequence elements. The presence of these 
"universal" priming sequences further facilitates later 

25 sequence verification, providing a sequence common to all 
amplicons at which to prime sequencing reactions. The 
common 5' and 3' sequences further serve to add a cloning 
site should any of the ORFs warrant further study. 

Such predetermined sequence is usefully at least 

30 about 10, 12 or 15 nt in length, and usually does not 
exceed about 25 nt in length. The "universal" priming 
sequences used in the examples presented infra were each 16 
nt long. 

The genomic DNA to be used as substrate for 
35 amplification will come from the eukaryotic species from 
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which the genomic sequence data had originally been 
obtained, or a closely related species, and can 
conveniently be prepared by well known techniques from 
somatic or germline tissue or cultured cells of the 
5 organism. See, e.g., Short Protocols in Molecular Biology 
: A Compendium of Methods from Current Protocols in 
Molecular Biology , Ausubel et al. (eds.}, 4 th edition 
(April 1999), John Wiley & Sons (ISBN: 047132938X) and 
Maniatis et al., Molecular Cloning : A Laboratory Manual , 
10 2 nd edition (December 1989), Cold Spring Harbor Laboratory 
Press (ISBN: 0879693096). Many such prepared genomic DNAs 
are available commercially, with the human genomic DNAs 
additionally having certification of donor informed 
consent. 

15 Although the intronic and intergenic material 

flanking putative coding regions in the amplicons could 
potentially interfere with hybridizations during microarray 
experiments, we have found, surprisingly, that differential 
expression ratios are not significantly affected. Rather, 

20 the predominant effect of exon size is to alter the 

absolute signal intensity, rather than its ratio. Equally 
surprising, the art had suggested that single exon probes 
would not provide sufficient signal intensity for high 
stringency hybridization analyses; we find that such probes 

25 not only provide adequate signal, but have substantial 
advantages, as herein described. 

After partial purification, as by size exclusion 
spin column, with or without confirmation as to amplicon 
quality as by gel electrophoresis, each amplicon (single 

30 exon probe) is disposed in an array upon a support 
substrate . 

Methods for creating microarrays by deposition 
and fixation of nucleic acids onto support substrates are 
well known in the art (Reviewed by Schena et al., see 
35 above) . 
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Typically, the support substrate will be glass, 
although other materials, such as amorphous or crystalline 
silicon or plastics. Such plastics include 
polymethylacrylic, polyethylene, polypropylene, 
5 polyacrylate, polymethylmethacrylate, polyvinylchloride, 
polytetraf luoroethylene, polystyrene, polycarbonate, 
polyacetal, polysulfone, celluloseacetate, 

cellulosenitrate, nitrocellulose, or mixtures thereof, can 
also be used. Typically, the support will be rectangular, 

10 although other shapes, particularly circular disks and even 
spheres, present certain advantages. Particularly 
advantageous alternatives to glass slides as support 
substrates for array of nucleic acids are optical discs, as 
described in WO 98/12559. 

15 The amplified nucleic acids can be attached 

covalently to a surface of the support substrate or, more 
typically, applied to a derivatized surface in a chaotropic 
agent that facilitates denaturation and adherence by 
presumed noncovalent interactions, or some combination 

20 thereof. 

Robotic spotting devices useful for arraying 
nucleic acids on support substrates can be constructed 
using public domain specifications (The MGuide, version 
2.0, http://cmgm.stanford.edu/pbrown/mguide/index.html), or 

25 can conveniently be purchased from commercial sources 

(MicroArray Genii Spotter and MicroArray Genlll Spotter, 
Molecular Dynamics, Inc., Sunnyvale, CA) . Spotting can 
also be effected by printing methods, including those using 
ink jet technology. 

30 As is well known in the art, microarrays 

typically also contain immobilized control nucleic acids. 
For controls useful in providing' measurements of background 
signal for the genome-derived single exon microarrays of 
the present invention, a plurality of E. coli genes can 

35 readily be used. As further described in Example 1, 16 or 
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32 E. coli genes suffice to provide a robust measure of 
background noise in such microarrays. 

As is well known in the art, the amplified 
product disposed in arrays on a support substrate to create 
5' a nucleic acid microarray can consist entirely of natural 
nucleotides linked by phosphodiester bonds, or 
alternatively can include either nonnative nucleotides, 
alternative internucleotide linkages, or both, so long as 
complementary binding can be obtained in the hybridization. 

10 If enzymatic amplification is used to produce the 

immobilized probes, the amplifying enzyme will impose 
certain further constraints upon the types of nucleic acid 
analogs that can be generated. 

Although particularly described herein as using 

15 high density microarrays constructed on planar substrates, 
the methods of the present invention for confirming the 
expression of ORFs predicted from genomic sequence can use 
any of the known types of microarrays, as herein defined, 
including lower density planar arrays, and microarrays on 

20 nonplanar, nonunitary, distributed substrates. 

For example, gene expression can be confirmed 
using hybridization to lower density arrays, such as those 
constructed on membranes, such as nitrocellulose, nylon, 
and positively-charged derivatized nylon membranes. 

25 Further, gene expression can also be confirmed using 

nonplanar, bead-based microarrays such as are described in 
Brenner et al. f Proc. Natl. Acad. Sci. USA 91 (4 ): 166501670 
(2000); U.S. Patent No. 6,057,107; and U.S. Patent No. 
5,736,330. In theory, a packed collection of such beads 

30 provides in aggregate a higher density of nucleic acid 
probe than can be achieved with spotting or lithography 
techniques on a single planar substrate. 

Planar microarrays on solid substrates, however, 
provide certain useful advantages, including high 

35 throughput and compatibility with existing readers. For 

34 



WO 01/57275 PCT/US01/00667 

example, each standard microscope slide can include at 
least 1000, typically at least 2000, preferably 5000 and 
upto 10,000 - 50,000 or more nucleic acid probes of 
discrete sequence. The number of sequences deposited will 
5 depend on their required application. 

Each putative gene can be represented in the 
array by a single predicted ORF. Alternatively, genes can 
be represented by more than one predicted ORF. For 
.purposes, of measuring differential splicing, more than one 

10 predicted ORF will be provided for a putative gene. And as 
is well known in the art, each probe of defined sequence, 
representing a single predicted ORF, can be deposited in a 
plurality of locations on a single microarray to provide 
redundancy of signal. 

15 The genome-derived single exon microarrays 

described above differ in several fundamental and 
advantageous ways from microarrays presently used in the 
gene expression art, including (1) those created by 
deposition of mRNA-derived nucleic acids, (2) those created 

20 by in situ synthesis of oligonucleotide probes, and (3) 
those constructed from yeast genomic DNA. 

Most nucleic acid microarrays that are in use for 
study of eukaryotic gene expression have as immobilized 
probes nucleic acids that are derived — either directly or 

25 indirectly — from expressed message. As discussed above, 
it is common, for example, for such microarrays to be 
derived from cDNA/EST libraries, either from those 
previously described in the literature, see Lennon et al., 
or from the de novo construction of "problem specific" 

30 libraries targeted at a particular biological question, 
R.S. Thomas et al., Cancer Res. (in press). Such 
microarrays are herein collectively denominated "EST, 
microarrays". 

Such EST microarrays by definition can measure 

35 expression only of those genes found in EST libraries, 
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shown herein to represent only a fraction of expressed 
genes. Furthermore, such libraries - and thus microarrays 
based thereupon — are biased by the tissue or cell type of 
message origin, by the expression levels of the respective 
5 genes within the tissues, and by the ability of the message 
successfully to have been reverse-transcribed and cloned. 

Thus, as further discussed in Example 1, the 
methods of the present invention enable sequences that do 
not appear in EST or other expression databases to be 

10 determined - subsequently arrayed for expression 

measurements could not, therefore, have been represented as 
probes on an EST microarray. And as further demonstrated 
in the examples, infra, the remaining population of genes 
identified from genomic sequence by the methods of the 

15 present invention - that is, the one third of sequences 
that had previously been accessioned in EST or other 
expression databases - are biased toward genes with higher 
expression levels. 

Representation of a message in an EST and/or cDNA 

20 library depends upon the successful reverse transcription, 
optionally but typically with subsequent successful 
cloning, of the message. This introduces substantial bias 
into the population of probes available for arraying in EST 
microarrays . . 

25 In contrast, neither reverse transcription nor 

cloning is required to produce the probes arrayed on the 
genome-derived single exon microarrays of the present 
invention. And although the ultimate deposition of a probe 
on the genome-derived single exon microarray of the present 

30 invention depends upon a successful amplification from 

genomic material, a priori knowledge of the sequence of the 
desired amplicon affords greater opportunity to recover any 
given probe sequence recalcitrant to amplification than is 
afforded by the requirement for successful reverse 

35 transcription and cloning of unknown message in EST 
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approaches . 

Thus, the genome-derived single exon microarrays 
of the present invention present a far greater diversity of 
probes for measuring gene expression, with far less bias, 
5 than do EST microarrays presently used in the art. 

As a further consequence of their ultimate origin 
from expressed message, the probes in EST microarrays often 
contain poly-A {or complementary poly-T,) stretches derived 
from the poly-A tail of mature mRNA. These homopolymeric 
10 stretches contribute to cross-hybridization, that is, to a 
spurious signal occasioned by hybridization to the 
homopolymeric tail of a labeled cDNA that lacks sequence 
homology to the gene-specific portion of the probe. 

In contrast, the probes arrayed in the genome- 
15 derived single exon microarrays of the present invention 
lack homopolymeric stretches derived from message 
polyadenylation, and thus can provide more specific signal. 
Typically, at least about 50, 60 or 75% of the probes on 
the genome-derived single exon microarrays of the present 
20 invention lack homopolymeric regions consisting of A or T, 
where a homopolymeric region is defined for purposes herein 
as stretches of 25 or more, typically 30 or more, identical 
nucleotides . 

A further distinction, which also affects the 
25 specificity of hybridization, is occasioned by the typical 
. derivation of EST microarray probes from cloned material. 
Because much of the probe material disposed as probes on 
EST microarrays is excised or amplified from plasmid, 
phage, or phagemid vectors, EST microarrays typically 
30 include a fair amount of vector sequence, more so when the 
probes are amplified, rather than excised, from the vector. 

In contrast, the vast majority of probes in the 
genome-derived single exon microarrays of the present 
invention contain no prokaryotic or bacteriophage vector 
35 sequence, having been amplified directly or indirectly from 
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genomic DNA. Typically, therefore, at least about 50, 60, 
70 or 80% or more of individual exon-including probes 
disposed on a genome-derived single exon microarray of the 
present invention lack vector sequence, and particularly 
5 lack sequences drawn from plasmids and bacteriophage. 

Preferably, at least about 85, 90 or more than 90% of exon- 
including probes in the genome-derived single exon 
microarray of the present invention lack vector sequence. 
With attention to removal of vector sequences through 

10 preprocessing 24, percentages of vector-free exon-including 
probes can be as high as 95 - 99%. The substantial absence 
of vector sequence from the genome-derived single exon 
microarrays of the present invention results in greater 
specificity during hybridization, since spurious cross- 

15 hybridization to a probe vector sequence is reduced. 

As a further consequence of excision or 
amplification of probes from vectors in construction of EST 
microarrays, the probes arrayed thereon often contain 
artificial sequence, derived from vector polylinker 

20 multiple cloning sites, at both 5' and 3 ? ends. The probes 
disposed upon the genome-derived single exon microarrays 
need have no such artificial sequence appended thereto. 

As mentioned above, however, the ORF-specific 
primers used to amplify putative ORFs can include 

25 artificial sequences, typically 5' to the ORF-specific 
primer sequence, useful for "universal" (that is, 
independent of ORF sequence) priming of subsequent 
amplification or sequencing reactions. When such 
"universal" 5' and/or 3' priming sequences are appended to 

30 the amplification primers, the probes disposed upon the 
genome-derived single exon microarray will include 
artificial sequence similar to that found in EST 
microarrays. However, the genome-derived single exon 
microarray of the present invention can be made without 

35 such sequences, and if so constructed, presents an even 
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smaller amount of nonspecific sequence that would 
contribute to nonspecific hybridization. 

Yet another consequence of typical use of cloned 
material as probes in EST microarrays is that such 
5 microarrays contain probes that result from cloning 

artifacts, such as chimeric molecules containing coding 
region of two separate genes. Derived from genomic 
material, typically not thereafter cloned, the probes of 
the genome-derived single exon microarrays of the present 

10 invention lack such cloning artifacts, and thus provide 
greater specificity of signal in gene expression 
measurements . 

A further consequence of the cloned origin of 
probes on many EST microarrays is that the individual 

15 probes often have disparate sizes, which can cause the 

optimal hybridization stringency to vary among probes on a 
single microarray. In contrast, as discussed above, the 
probes arrayed on the genome-derived single exon 
microarrays of the present invention can readily be 

20 designed to have a narrow distribution in sizes, with the 
range of probe sizes no greater than about 10% of the 
average size, typically no greater than about 5% of the 
average probe size. 

Because of their origin from fully- or partially- 

25 spliced message, probes disposed upon EST arrays will often 
include multiple exons. The percentage of such exon- 
spanning probes in an EST microarray can be calculated, on 
average, based upon the predicted number of exons/gene for 
the given species and the average length of the immobilized 

30 probes. For human genes, the near-complete sequence of 

human chromosome 22, Dunham et al., Nature 402 (6761) : 489-95 
(1999), predicts that human genes average 5.5 exons/gene. 
Even with probes of 200 - 500 bp, the vast majority of 
human EST microarray probes include more than one exon. 

35 In contrast, by virtue of their origin from 
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algorithmically identified ORFs in genomic sequence, the 
probes in the genome-derived single exon microarrays of the 
present invention can consist of individual exons . Thus, 
in contrast to EST microarrays, at least about 50, 60, 70, 
5 75, 80, 85, 95 or 99% of probes deposited in the genome- 
derived microarray of the present invention consist of, or 
include, no more than one predicted ORF. 

This provides the ability, not readily achieved 
using EST microarrays, to use the genome-derived single 

10 exon microarrays of the present invention to measure 

tissue-specific expression of individual exons, which in 
turn allows differential splicing events to be detected and 
characterized, and in particular, allows the correlation of 
differential splicing to tissue-specific expression 

15 patterns. 

Furthermore, the exons that are represented in 
EST microarrays are often biased toward the 3' or 5' end of 
their respective genes, since sequencing strategies used 
for EST identification are so biased. In contrast, no such 

20 3' or 5' bias necessarily inheres in the selection of exons 
for disposition on the genome-derived single exon 
microarrays of the present invention. 

Conversely, the probes provided on the genome- 
derived single exon microarrays of the present invention 

25 typically, but need not necessarily, include intronic 
and/or intergenic sequence that is absent from EST 
microarrays, which are derived from mature mRNA. 
Typically, at least about 50, 60, 70, 80 or 90% of the 
exon-including probes on the genome-derived single exon 

30 microarrays of the present invention include sequence drawn 
from noncoding regions. As discussed above, the additional 
presence of noncoding region does not significantly 
interfere with measurement of gene expression, and provides 
the additional opportunity to assay prespliced RNA, and 

35 thus measure such phenomena such as nuclear export control. 
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The genome-derived single exon microarrays of the 
present invention are also quite different from in situ 
synthesis microarrays, where probe size is severely 
constrained by inadequacies in the photolithographic 
5 synthesis process. 

Typically, probes arrayed on in situ synthesis 
microarrays are limited to a maximum of about 25 bp. As a 
well known consequence, hybridization to such chips must be 
performed at low stringency. In order, therefore, to 

10 achieve unambiguous sequence-specific hybridization 
results, the in situ synthesis microarray requires 
substantial redundancy, with concomitant programmed 
arraying for each probe of probe analogues with altered 
(i.e., mismatched) sequence? 

15 In contrast, the longer 1 probe length of the 

genome-derived single exon microarrays of the present 
invention allows much higher stringency hybridization and 
wash. Typically, therefore, exon-including probes on the 
genome-derived single exon microarrays of the present 

20 invention average at least about 100, 200, 300, 400 or 
500 bp in length. By obviating the need for substantial 
probe redundancy, this approach permits a higher density of 
probes for discrete exons or genes to be arrayed on the 
microarrays of the present invention than can be achieved 

25 for in situ synthesis microarrays. 

A further distinction is that the probes in in 
situ synthesis microarrays typically are covalently linked 
to the substrate surface. In contrast, the probes disposed 
on the genome-derived microarray of the present invention 

30 typically are, but need not necessarily be, bound 
noncovalently to the substrate. 

Furthermore, the short probe size on in situ 
microarrays causes large percentage differences in the 
melting temperature of probes hybridized to their 

35 complementary target sequence, and thus causes large 
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percentage differences in the theoretically optimum 
stringency across the array as a whole. 

In contrast, the larger probe size in the 
microarrays of the present invention create lower 
5 percentage differences in melting temperature across the 
range of arrayed probes. 

A further significant advantage of the 
microarrays of the present invention over in situ 
synthesized arrays is that the quality of each individual 
10 probe can be confirmed before deposition. In contrast, the 
quality of probes cannot be assessed on a probe-by-probe 
basis for the in situ synthesized microarrays presently 
being used. 

The genome-derived single exon microarrays of the 
15 present invention are also distinguished over, and present 
substantial benefits over, the genome-derived microarrays 
from lower eukaryotes such as yeast. Lashkari et al. f 
Proc. Natl. Acad. Sci. USA 94:13057-13062 (1997). 

Only about 220 - 250 of the 6100 or so nuclear 
20 genes in Saccharomyces cerevisiae — that is, only about 4 
- 5% — have standard, spliceosomal, introns, Lopez et al. f 
Nucl. Acids Res. 28:85-86 (2000); Spingola et al. f RNA 
5(2):221-34 (1999). Furthermore, the entire yeast genome 
has already been sequenced. These two facts permit the 
25 ready amplification and disposition of single-ORF amplicons 
on such microarray without the requirement for antecedent 
use of gene prediction and/or comparative sequence 
analyses. 

Thus, a significant aspect of the present 
30 invention is the ability to identify and to confirm 

expression of predicted coding regions in genomic sequence ■ 
drawn from eukaryotic organisms that have a higher 
percentage of genes having introns than do yeast such as 
Saccharomyces cerevisiae, particularly in genomic sequence 
35 drawn from eukaryotes in which at least about 10, 20 or 50% 
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of protein-encoding genes have introns. In preferred 
embodiments, the methods and apparatus of the present 
invention are used to identify and confirm expression of 
novel genes from genomic sequence of eukaryotes in which 
5 the average number of introns per gene is at least about 
one, two or three or more. 

After the physical substrate is prepared, 
experimental verification of predicted function is 
performed. 

10 In a preferred embodiment of the present 

invention, where the function sought to be identified in 
genomic sequence is protein coding, experimental 
verification- is performed by measuring expression of the 
putative ORFs, typically through nucleic acid hybridization 

15 experiments, and in particularly preferred embodiments, 
through hybridization to genome-derived single exon 
microarrays prepared as above- described. 

Expression is conveniently measured and expressed 
for each probe in the microarray as a ratio of the 

20 expression measured concurrently in a plurality of mRNA 
sources, according to techniques well known in the 
microarray art, Reviewed in Schena et al . , and as further 
described in Example 2, below. The mRNA source for the 
reference against which specific expression is measured can 

25 be drawn from a homogeneous mRNA source, such as a single 
cultured cell-type, or alternatively can be heterogeneous, 
as from a pool of mRNA derived from multiple tissues and/or 
cell types, as further described in Example' 2, infra. 

mRNA can be prepared by standard techniques, see 

30 Ausubel et al. and Maniatis et al., or purchased 
commercially. The mRNA is then typically reverse- 
transcribed in the presence of labeled nucleotides: the 
index source (that in which expression is desired to be 
measured) is reverse transcribed in the presence of 

35 nucleotides labeled with a first label, typically a 
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fluorophore ( f luorochrome; fluor; fluorescent dye) ; the 
reference source is reverse transcribed in the presence of 
a second label, typically a fluorophore, typically 
f luorometrically-distinguishable from the first label. As 
5 further described in Example 2, infra, Cy3 and Cy5 dyes 
prove particularly useful in these methods. After partial 
purification of the index and reference - targets, 
hybridization to the probe array is conducted according to 
standard techniques, typically under a coverslip. 

10 After wash, microarrays are conveniently scanned 

using a commercial microarray scanning device, such as a 
Gen3 Scanner (Molecular Dynamics, Sunnyvale, CA) . Data on 
expression is then passed, with or without interim storage, 
to process 500, where the results for each probe are 

15 related to the original sequence. 

Often, hybridization of target material to the 
genome-derived single exon microarray will identify certain 
of the probes thereon as of particular interest. Thus, it 
is often desirable that the user be able readily to obtain 

20 sufficient quantities of an individual probe, either for 
subsequent arrayed deposition upon an additional support 
substrate, often as part of a microarray having a plurality 
of probes so identified, or alternatively or additionally 
as a solitary solid-phase or solution-phase probe, for 

25 further use. 

Thus, in another aspect, the present invention 
provides compositions and kits for the ready production of 
nucleic acids identical in sequence to, or substantially 
identical in sequence to, probes on the genome-derived 

30 single exon microarrays of the present invention. 

In- this aspect, a small quantity of each probe is 
disposed, typically without attachment to substrate, in a 
spatially-addressable ordered set, typically one per well 
of a microtiter dish. Although a 96 well microtiter plate 

35 can be used, greater efficiency is obtained using higher 
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density arrays, such as are provided by microtiter plates 
having 384, 864, 1536, 3456, 6144, or 9600 wells, and 
although microtiter plates having physical depressions 
(wells) are conveniently used, any device that permits 
5 addressable withdrawal of reagent from fluidly- 
noncommunicating areas can be used. 

In this aspect of the invention, therefore, a 
fluidly noncommunicating addressable ordered set of 
-individual probes, corresponding to those on a genome- 

10 derived single exon microarray, is provided, with each 

probe in sufficient quantity to permit amplification, such 
as by PCR. As earlier mentioned, the ORF-specific 
5' primers used for genomic amplification can have a first 
common sequence added thereto, and the ORF-specific 3' 

15 primers used for genomic amplification can have a second, 
different, common sequence added thereto, thus permitting, 
•in this preferred embodiment, the use of a single set of 5' 
and 3' primers to amplify any one of the probes from the 
amplifiable ordered set. 

20 Each discrete amplifiable probe can also be 

packaged with amplification primers, solutes, buffers, 
etc., and can be provided in dry (e.g., lyophilized) form 
or wet, in the latter case typically with addition of 
agents that retard evaporation. 

25 In another aspect of the present invention, a 

genome-derived single-exon microarray is packaged together 
with such an ordered set of amplifiable probes 
corresponding to the probes, or one or more subsets of 
probes, thereon. In alternative embodiments, the ordered 

30 set of amplifiable probes is packaged separately from the 
genome-derived single exon microarray. 

In some embodiments, the microarray and/or 
ordered probe set are further packaged with recordable 
media that provide probe identification and addressing 

35 information, and that can additionally contain annotation 
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information, such as gene expression data. Such recordable 
media can be packaged with the microarray, with the ordered 
probe set, or with both. 

If the microarray is constructed on a substrate 
5 that incorporates recordable media, such as is described in 
international patent application no. WO 98/12559, then 
separate packaging of the genome-derived single exon 
microarray and the bioinf ormatic information is not 
required . 

10 The amount of amplifiable probe material should 

be sufficient to permit at least one amplification 
sufficient for subsequent hybridization assay. 

Although the use of high density genome-derived 
microarrays on solid planar substrates is presently a 

15 preferred approach for the physical confirmation and 

characterization of the expression of sequences predicted 
to encode protein, other types of microarrays (as herein 
defined) can also be used. 

Furthermore, as earlier mentioned, experimental 

20 verification of the function predicted from genomic 

sequence in process 200 can be bioinf ormatic, rather than, 
or additional to, physical verification. 

For example, where the function desired to be 
identified is protein coding, the predicted ORFs can be 

25 compared bioinf ormatically to sequences known or suspected 
of being expressed. 

Thus, the sequences output from process 300 (or 
process 200) , can be used to query expression databases, 
such as EST databases, SNP ("single nucleotide 

30 polymorphism") databases, known cDNA and mRNA sequences, 
SAGE ("serial analysis of gene expression") databases, and 
more generalized sequence databases that allow query for 
expressed sequences. Such query can be done by any 
sequence query algorithm, such as BLAST ("basic local 

35 alignment search tool") . The results of such query - 
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including information on identical sequences and 
information on nonidentical sequences that have diffuse or 
focal regions of sequence homology to the query sequence — 
can then be passed directly to process 500, or used to 
5 inform analyses subsequently undertaken in process 200, 
process 300, or process 400. 

Experimental data, whether obtained by physical 
or bioinf ormatic assay in process 400, is passed to process 
500 where it is usefully related to the sequence data 
itself, a process colloquially termed "annotation". Such 
annotation can be done using any technique that usefully 
relates the functional information to the sequence, as, for 
example, by incorporating the functional data into the 
record itself, by linking records in a hierarchical or 
relational database, by linking to external databases, or 
by a combination thereof. Such database techniques are 
well within the skill in the art. 

The annotated sequence data can be stored 
locally, uploaded to genomic sequence database 100, and/or 
displayed 800. 

The methods and apparatus of the present 
invention rapidly produce functional information from 
genomic sequence. Coupled with the escalating pace at 
which sequence now accumulates, the rapid pace of sequence 
annotation produces a need for methods of displaying the 
information in meaningful ways. 

FIG. 3 shows visual display 80 presenting a 
single genomic sequence annotated according to the present 
invention. Because of its nominal resemblance to artistic 
works of Piet Mondrian, visual display 80 is alternatively 
described herein as a "Mondrian" . 

Each of the visual elements of display 80 is 
aligned with respect to the genomic sequence being 
annotated (hereinafter, the "annotated sequence") . Given 
the number of nucleotides typically represented in an 
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annotated sequence, representation of individual 
nucleotides would rarely be readable in hard copy output of 
display 80. Typically, therefore, the annotated sequence 
is schematized as rectangle 89, extending from the left 
5 border of display 80 to its right border. By convention 
herein, the left border of rectangle 89 represents the 
first nucleotide of the sequence and the right border of 
rectangle 8 9 represents the last nucleotide of the 
sequence . 

10 As further discussed below, however, the Mondrian 

visual display of annotated sequence can serve as a 
convenient graphical user interface for computerized 
representation, analysis, and query of information stored 
electronically. For such use, the individual nucleotides 

15 can conveniently be linked to the X axis coordinate of 

rectangle 89. This permits the annotated sequence at any 
point within rectangle 8 9 readily to be viewed, either 
automatically — for example, by time-delayed appearance of 
a small overlaid window upon movement of a cursor or other 

20 pointer over rectangle 89 - or through user intervention, 
as by clicking a mouse or other pointing device at a point 
in rectangle 89. 

Visual display 80 is generated after user 
specification of the genomic sequence to be displayed. 

25 Such specification can consist of or include an accession 
number for a single clone (e.g., a single BAC accessioned 
into GenBank) , wherein the starting and stopping 
nucleotides are thus absolutely identified, or ' 
alternatively can consist of or include an anchor or 

30 fulcrum point about which a chosen range of sequence is 
anchored, thus providing relative endpoints for the 
sequence to be displayed. For example, the user can anchor 
such a range about a given chromosomal map location, gene 
name, or even a sequence returned by query for similarity 

35 or identity to an input query sequence. When visual 
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display 80 is used as a graphical user interface to 
computerized data, additional control over the first and 
last displayed nucleotide will typically be dynamically 
selectable, as by use of standard zooming and/or selection 
5 tools. 

Field 81 of visual display 80 is used to present 
the output from process 200, that is, to present the 
bioinf ormatic prediction of those sequences having the 
desired function within the genomic sequence. Functional 
10 sequences are typically indicated by at least one rectangle 
83 (83a, 83b, 83c), the left and right borders of which 
respectively indicate, by their X-axis coordinates, the 
starting and ending nucleotides of the region predicted to 
have function. 

15 Where a single bioinf ormatic method or approach 

identifies a plurality of regions having the desired 
function, a plurality of rectangles 83 is disposed 
horizontally in field 81. Where multiple methods and/or 
approaches are used to identify function, each such method 

20 and/or approach can be represented by its own series of 

horizontally disposed rectangles 83, each such horizontally 
disposed series of rectangles offset vertically from those 
representing the results of the other methods and 
approaches . 

25 Thus, rectangles 83a in FIG. 3 represent the 

functional predictions of a first method of a first 
approach for predicting function, rectangles 83b represent 
the functional predictions of a second method and/or second 
approach for predicting that function, and rectangles 83c 

30 represent the predictions of a third method and/or 
approach. 

Where the function desired to be identified is 
protein coding, field 81 is used to present the 
bioinf ormatic prediction of sequences encoding protein. 
35 For example, rectangles 83a can represent the results from 
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GRAIL or GRAIL II r rectangles 83b can represent the results 
from GENEFINDER, and rectangles 83c can represent the 
results from DICTION. 

Optionally, and preferably, rectangles 83 
5 collectively representing predictions of. a single method 
and/or approach are identically colored and/or textured, 
and are distinguishable from the color and/or texture used 
for a different method and/or approach. 

Alternatively, or in addition, the color, hue, 

10 density, or texture of rectangles 83 can be used further to 
report a measure of the bioinf ormatic reliability of the 
prediction. For example, many gene prediction programs 
will report a measure of the reliability of prediction. 
Thus, increasing degrees of such reliability can be 

15 indicated, e.g., by increasing density of shading. Where 
display 80 is used as a graphical user interface, such 
measures of reliability, and indeed all other results 
output by the program, can additionally or alternatively be 
made accessible through linkage from individual rectangles 

20 83, as by time-delayed window ("tool tip" window), or by 
pointer {e.g., mouse) -activated link. 

As earlier described, increased predictive 
reliability can be achieved by requiring consensus among 
methods and/or approaches to determining function. Thus, 

25 field 81 can include a horizontal series of rectangles 83 
that indicate one or more degrees of consensus in 
predictions of function. 

Although FIG. 3 shows three series of - 
horizontally disposed rectangles in field 81, display 80 

30 can include as few as one such series of rectangles and as 
many as can discriminably be displayed, depending upon the 
number of methods and/or approaches used to predict a given 
function. 

Furthermore, field 81 can be used to show 
35 predictions of a plurality of different functions. 
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However, the increased visual complexity occasioned by such 
display makes more useful the ability of the user to select 
a single function for display. When display 80 is used as 
a graphical user interface for computer query and analysis, 
5 such function can usefully be indicated and user- 
selectable, as by a series of graphical buttons or tabs 
(not shown in FIG. 3} . 

Rectangle 89 is shown in FIG. 3 as including 
interposed rectangle 84. Rectangle 84 represents the 
portion of annotated sequence for which predicted 
functional information has been assayed physically, with 
the starting and ending nucleotides of the assayed material 
indicated by the X axis coordinates of the left and right 
borders of rectangle 84. Rectangle 85, with optional 
inclusive circles 86 (86a, 86b, and 86c) displays the 
results of such physical assay. 

Although a single rectangle 84 is shown in FIG. 
3, physical assay is not limited to just one region of 
annotated genomic sequence. It is expected that an 
increasing percentage of regions predicted to have function 
by process 200 will be assayed physically, and that display 
80 will accordingly, for any given genomic sequence, have 
an increasing number of rectangles 84 and 85, representing 
an increased density of sequence annotation. 

Where the function desired to be identified is 
protein coding, rectangle 84 identifies the sequence of the 
probe used to measure expression. In embodiments of the 
present invention where expression is measured using 
genome-derived single exon microarrays, rectangle 84 
identifies the sequence included within the probe 
immobilized on the support surface of the microarray. As 
noted supra, such probe will often include a small amount 
of additional, synthetic, material incorporated during 
amplification and designed to permit reamplif ication of the 
probe, which sequence is typically not shown in display 80. 
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Rectangle 87 is used to present the results of 
bioinf ormatic assay of the genomic sequence- For example, 
where the function desired to be identified is protein 
coding, process 400 can include bioinf ormatic query of 

5 expression databases with the sequences predicted in 
process 200 to encode exons. And as earlier discussed, 
because bioinf ormatic assay presents fewer constraints than 
does physical assay, often the entire output of process 200 
can be used for such assay, without further subsetting 

10 thereof by process 300. Therefore, rectangle 87 typically 
need not have separate indicators therein of regions 
submitted for bioinf ormatic assay; that is, rectangle 87 
typically need-not have regions therein analogous to 
rectangles 84 within rectangle 89. 

15 Rectangle 87 as shown in FIG. 3 includes smaller 

rectangles 880 and 88. Rectangles 880 indicate regions 
that returned a positive result in the bioinf ormatic assay, 
with rectangles 88 representing regions that did not return 
such positive results. Where the function desired to be 

20 predicted and displayed is protein coding, rectangles 880 
indicate regions of the predicted exons that identify 
sequence with significant similarity in expression 
databases, such as EST, SNP, SAGE databases, with 
rectangles 88 indicating genes novel over those identified 

25 in existing expression data bases. 

Rectangles 880 can further indicate, through 
color, shading, texture, or the like, additional 
information obtained from bioinf ormatic assay. 

For example, where the function assayed and 

30 displayed is protein coding, the degree of shading of 
rectangles 880 can be used to represent the degree of 
sequence similarity found upon query of expression 
databases. The number of levels of discrimination can be 
as few as two (identity, and similarity, where similarity 

35 has a user-selectable lower threshold) . Alternatively, as 



52 



WO 0 1 /S7275 PCT/US01/00667 

many different levels of discrimination can be indicated as 
can visually be discriminated. 

Where display 80 is used as a graphical user 
interface, rectangles 880 can additionally provide links 
5 directly to the sequences identified by the query of 

expression databases, and/or statistical summaries thereof. 
As with each of the precedingly-discussed uses of display 
80 as a graphical user interface, it should be understood 
that the information accessed via display 80 need not be 
10 resident on the computer presenting such display, which 
often will be serving as a client, with the linked 
information resident on one or more remotely located 
servers . 

Rectangle 85 displays the results of physical 
15 assay of the sequence delimited by its left and right 
borders . 

Rectangle 85 can consist of a single rectangle, 
thus indicating a single assay, or alternatively, and 
increasingly typically, will consist of a series of 

20 rectangles (85a, 85b, 85c) indicating separate physical 
assays of the same sequence. 

Where the function assayed is gene expression, 
and where gene expression is assayed as herein described 
using simultaneous two-color fluorescent detection of 

25 hybridization to genome-derived single exon microarrays, 
individual rectangles 85 can be colored to indicate the 
degree of expression relative to control. Conveniently, 
shades of green can be used to depict expression in the 
sample over control values, and shades of red used to 

30 depict expression less than control, corresponding to the 
spectra of the Cy3 and Cy5 dyes conventionally used for 
respective labeling thereof. Additional functional 
information can be provided in the form of circles 8 6 (8 6a, 
86b, 86c), where the diameter of the circle can be used to 

35 indicate expression intensity. As discussed infra, such 
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relative expression {expression ratios) and absolute 
expression (signal intensity) can be expressed using 
normalized values. 

Where display 80 is used as a graphical user 
5 interface, rectangle 85 can be used as a link to further 
information about the assay. For example, where the assay 
is one for gene expression, each rectangle 85 can be used 
to link to information about the source of the hybridized 
inRNA, the identity of the control, raw or processed data 

10 from the microarray scan, or the like. 

FIG. 4 is rendition of display 80 representing 
gene prediction and gene expression for a hypothetical BAC, 
showing conventions used in the Examples presented infra. 
BAC sequence ("Chip seq.") 89 is presented, with the 

15 physically assayed region thereof (corresponding to 

rectangle 84 in FIG. 3) shown in white. Algorithmic gene 
predictions are shown in field 81, with predictions by 
GRAIL shown, predictions by GENEFINDER, and predictions by 
DICTION shown. Within rectangle 87, regions of sequence 

20 that, when used to query expression databases, return 
identical or similar sequences ("EST hit") are shown as 
white rectangles (corresponding to rectangles 880 in FIG. 
3) , gray indicates low homology, and black indicates 
unknowns (where black and gray would correspond to 

25 rectangles 88 in FIG. 3) . 

Although FIGS. 3 and 4 show a single stretch of 
sequence, uninterrupted from left to right, longer 
sequences are usefully represented by vertical stacking of 
such individual Mondrians, as shown in FIGS. 9 and 10. 

30 

Single Exon Probes Useful For Measuring Gene Expression 

The methods and apparatus of the present 
invention rapidly produce functional information from 
35 genomic sequence. Where the function to be identified is 
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protein coding, the methods and apparatus of the present 
invention rapidly identify and confirm the expression of 
portions of genomic sequence that function to encode 
protein. As a direct result, the methods and apparatus of 
5 the present invention rapidly yield large numbers of 
single-exon nucleic acid probes, the majority from 
previously unknown genes, each of which is useful for 
measuring and/or surveying expression of a specific gene in 
one or more tissues or cell types. 

10 It is, therefore, another aspect of the present 

invention to provide genome-derived single exon nucleic 
acid probes useful for gene expression analysis, and 
particularly for gene expression analysis by microarray. 

Using the methods and genome-derived single-exon 

15 microarrays of the present invention, we have for example 
readily identified a large number of unique ORFs from human 
genomic sequence. Using single exon probes that encompass 
these ORFs, we have demonstrated, through microarray 
hybridization analysis, the expression of 12,821 of these 

20 ORFs in brain. 

As would immediately be appreciated by one of 
skill in the art, each single exon probe having 
demonstrable expression in brain is currently available for 
use in measuring the level of its ORF's expression in 

25 brain. 

Diseases of the brain and nervous system are a 
significant cause of human morbidity and mortality. 
Increasingly, genetic factors are being found that 
contribute to predisposition, onset, and/or aggressiveness 

30 of most, if not all, of these diseases. Although mutations 
in single genes have been identified as causative for some 
diseases of the brain and nervous system, for the most part 
these disorders are believed to have polygenic etiologies. 
For example, over the past few decades 

35 Alzheimer's disease (AD), once considered a rare disorder, 
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has become recognized as a major public health problem; 
over 4,000,000 people in the United States are now 
estimated to suffer with various stages of this 
progressive, degenerative brain disorder. 
5 Although there is no agreement on the exact 

incidence or prevalence of Alzheimer's disease, in part due 
to varying diagnostic criteria and difficulties of 
differential diagnosis among dementias, the studies are 
consistent in pointing to an exponential rise in prevalence 

10 of this disease with age. After age 65, the percentage of 
affected people approximately doubles with every decade of 
life, regardless of definition. Among people age 85 or 
older, studies suggest that 25 to 35 percent have dementia, 
including Alzheimer's disease; one study reports that 47.2 

15 percent of people over age 85 have Alzheimer's disease, 
■ exclusive of other dementias. 

Alzheimer's disease progressively destroys 
memory, reason, judgment, language, and, eventually, the 
ability to carry out even the simplest of tasks. Anatomic 

20 changes associated with Alzheimer's disease begin in the 
entorhinal cortex, proceed to the hippocampus, and then 
gradually spread to other regions, particularly the 
cerebral cortex. Chief among such anatomic changes are the 
presence of characteristic extracellular plaques and 

25 internal neurofibrillary tangles. 

Alzheimer's disease has been suspected to have a 
multifactorial genetic etiological component for almost 
half a century. Sjogren et al., Acta Psychiat. Neurol. 
Scand. 82 (suppl. ): 1-152 (1952). 

30 At least four genes have been identified to date 

that contribute to development of Alzheimer's disease: AD1 
is caused by mutations in the amyloid precursor gene (APP) ; 
AD2 is associated with the APOE4 allele on chromosome 19; 
AD3 is caused by mutation in a chromosome 14 gene encoding 

35 a 7-transmembrane domain protein, presenilin-1 (PSEN1), and 
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AD4 is caused by mutation in a gene on chromosome 1 that 
encodes a similar 7-transmembrane domain protein, 
presenilin-2 (PSEN2) . 

There is strong evidence, however, for 

5 additional, as yet uncharacterized, AD loci on other 
chromosomes . 

For example, Daw et al., Am. J. Hum. Genet. 66: 
196-204 (2000), estimated the number of additional 
quantitative trait loci (QTLs) and their contribution to 

10 the variance in age at onset of AD, and reported 

that 4 loci make a contribution to the variance in age at 
onset of late-onset AD similar to or greater in magnitude 
than that made by apoE, with one locus making a 
contribution several times greater than that of 

15 apoE. These results suggest that several genes not yet 
localized may play a larger role than does apoE in late- 
onset AD. 

In accord, three groups recently announced the 
possible existence of an AD susceptibility gene on 
20 chromosome 10. Bertram et al., Science 290 (5500) : 2302-2303 
(2000); Ertekin-Taner et al., Science 290 (5500) : 2303-2304 
(2000); and Myers et al., Science 290 ( 5500 ): 2304-23055 
(2000) . 

As another example, multiple sclerosis (MS) 
25 affects about 350,000 Americans, with approximately 200 new 
cases diagnosed each week, with an estimated annual 
monetary cost in the U.S. alone of $2.5 billion. 

Clinically, MS is an unpredictable disorder, with 
symptoms, presentation and course falling broadly into one 
30 of several clinical patterns. In relapsing-remitting (RR) 
MS, the disease first manifests as a series of attacks 
followed by complete or partial remissions, with symptoms 
returning later after a period of stability. In primary- 
progressive (PP) MS, there is a gradual clinical decline 
35 with no distinct remissions, although there may be 
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temporary plateaus or minor relief from symptoms. 
Secondary-progressive (SP) MS begins with a relapsing- 
remitting course followed by a later primary-progressive 
course. Rarely, patients may have a progressive-relapsing 

5 (PR) course in which the disease takes a progressive 

path punctuated by acute attacks. PP., SP, and PR MS are 
sometimes lumped together and called chronic progressive 
MS. The waxing and waning course characteristic of RR, SP 
and PR MS makes differential diagnosis difficult. 

10 Anatomically, MS attacks are associated with 

focal inflammation in areas of the white matter of the 
central nervous system (CNS) , accompanied or followed by 
demyelination in these areas, termed plaques. Destruction 
of the myelin sheath slows or blocks neurological 

15 transmission, leading to diminished or lost function. 
Clinical manifestations depend upon the location of the 
plaques and severity of demyelination, and range from 
fatigue, the most common symptom of MS, to visual 
impairment, due to inflammation of the optic nerve, termed 

20 optic neuritis, to numbness and paresthesias, to focal 
muscular weakness, ataxia, and bladder incontinence. 

Increasing evidence suggests that genotype 
contributes to susceptibility to MS. 

As early as 1965, McAlpine, in Multiple 

25 Sclerosis: A Reappraisal (McAlpine, ed. ) , Williams and 

Wilkins Co. pp. 61-74 (1965), concluded that the risk to a 
first-degree relative of a patient with multiple sclerosis 
is at least 15 times that for a member of the general 
population, but could discern no definite genetic pattern 

30 of inheritance. 

Subsequently, many studies associated MS with HLA 
(MHC) haplotype. Haines et al., Hum. Molec. Genet. 
7:1229-1234 (1998), studying a data set of 98 multiplex MS 
families, confirmed earlier reports that genetic linkage to 

35 the MHC can be explained by association with the HLA-DR2 
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allele, but suggested that MHC association explains only 
between 17% and 62% of the genetic etiology of MS. 

From a review of genomic screens, Dyment et al., 
Hum. Molec. Genet. 6: 1693-1698 (1997), concluded that -a 

5 number of genes with' interacting effects are likely and 
that no single region has a major influence on familial 
risk. Chataway et al., Brain 121: 1869-1887 (1998), 
reporting a follow-up on U.K. studies using a systematic 
genome screen to determine the genetic basis of MS, stated 

10 that a gene of major effect had been excluded from 95% of 
the genome and one with a moderate role from 65%, results 
thus suggesting that multiple sclerosis depends on 
independent or epistatic effects of several genes, each 
with small individual effects, rather than a very few genes 

15 of major biologic importance. 

As a yet further example, schizophrenia has long 
been recognized to have complex, likely polygenic, genetic 
contributions . 

Schizophrenia is a common psychiatric disorder, 

20 occurring in 1 to 1.5 percent of the population worldwide, 
and is characterized by variable constellations of symptoms 
drawn from a universe of behavioral abnormalities. 
Although there are accepted alternative diagnostic 
criteria, primary criteria for diagnosis require two or 

25 more of the following, each present for a significant 
portion of time during a 1-month period (or less 
if successfully treated): (1) delusions; (2) hallucinations 
; (3) disorganized speech (e.g., frequent derailment or 
incoherence); (4) grossly disorganized or catatonic 

30 behavior; (5) negative symptoms, i.e., affective 

flattening, alogia, or avolition. (Diagnostic and 
Statistic Manual of Mental Disorders DSM-IV-TR, American 
Psychiatric Association (2000) ) . Only one such symptom is 
required if delusions are bizarre or hallucinations 

35 consist of a voice keeping up a running commentary on the 
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person's behavior or thoughts, or consist of two or more 
voices conversing with each other. 

Three-quarters of persons with schizophrenia 
develop the disease between 16 and 25 years of age: onset 

5 is uncommon after age 30, rare after age 40. In the 16 to 
25 year old age group, schizophrenia affects more men than 
women; in the 25-30 year old group, the incidence is higher 
in women than in men. Studies have shown that some persons 
with schizophrenia recover completely, and many others- 

10 improve to the point where they can live independently, 
often with the maintenance of drug therapy. However, 
approximately 15 percent of people with schizophrenia 
respond only moderately to medication and require extensive 
support throughout their lives, while another 15 percent 

15 simply do not respond to existing treatment. 

Schizophrenia has long been known to have a 
significant genetic component. Studies have consistently 
demonstrated that the risk to relatives of a proband with 
schizophrenia is higher than the risk to relatives of 

20 controls. Moldin, in Genetics and Mental Disorders: Report 
of the NIMH Genetics Workgroup (NIH publication 98-4268, 
(1998), reviewed family and twin studies published between 
1920 and 1987 and found the recurrence risk ratios to be 48 
for monozygotic twins, 11 for first-degree relatives, 4.25 

25 for second-degree relatives, and 2 for third-degree 
relatives. He also found that concordance rates for 
monozygotic twins averaged 4 6%, even when reared in 
different families, whereas the concordance rates for 
dizygotic twins averaged only 14%. The prevalence of 

30 schizophrenia is known to be higher in biologic than in 
adoptive relatives of schizophrenic adoptees. 

The mode of inheritance is unclear, however. 
Susceptibility has been mapped to many loci, including 
chromosomes Iq21-q22, 5, 6p23, 8p22-p21, llq, 13ql4~q21, 

35 13q32, 15ql5, 15ql4, 18p, and 22qll. Chromosome 
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19 has also been implicated in schizophrenia, at 2 
different sites, as have sites on the X chromosome. Wei et 
al., Nature Genet. 25:376-377 (2000) report more 
specifically that the NOTCH4 locus is associated with 
5 susceptibility to schizophrenia. 

In. general, however, it is believed that 
development of schizophrenia involves multiple loci. 

For example, Williams et al . , Hum. Molec. Genet. 
8:1729-1739 (1999) undertook a systematic search for 

10 linkage in 196 affected sib pairs (ASPs) with 

schizophrenia. Using 229 microsatellite markers at an 

average intermarker distance of 17.26 cM, followed in a 
second stage by a further 54 markers allowing the regions 
identified in stage 1 to be typed at an average spacing of 

15 5.15 cM, Williams et al. considered results on chromosomes 
4p, 18q, and Xcen as suggestive; however, given the scores, 
Williams et al . interpreted their results as suggesting 
that common genes of major effect (susceptibility ratio 
more than 3) are unlikely to exist for schizophrenia. 

20 Similarly, Shaw et al., Am. J. Med. Genet. 

81(5): 364-76 (1998), in a genome-wide search for 
schizophrenia susceptibility genes, found that twelve 
chromosomes (1, 2, 4, 5, 8, 10, 11, 12, 13, 14, 16, and 
22) had at least one region with a nominal P value <0.05, 

25 that two of these chromosomes had a nominal P value <0.01 
(chromosomes 13 and 16), and that five chromosomes (1, 2, 
4, 11, and 13) had at least one marker with a lod score 
>2.0, suggesting the existence of multiple loci that 
contribute to schizophrenia susceptibility. 

30 As yet another example, multiple genes are 

thought to predispose to epilepsy. 

Epilepsy is characterized by recurrent, 
paroxysmal disorders of cerebral function (seizures); that 
is, by sudden, brief attacks of altered consciousness, 

35 motor activity, sensory phenomena, or inappropriate 
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behavior. The risk of developing epilepsy is 1% in the 
period from birth to age 20, and 3% at age 75. 

Epilepsy is caused by excessive discharge of 
cerebral neurons. Clinical manifestations depend on the 

5 type and location of discharge. In partial seizures, for 
example, the excess neuronal- discharge is contained within 
one region of the cerebral cortex. Simple partial 
seizures consist of motor, sensory, or psychomotor 
phenomena without loss of consciousness; the specific 

10 phenomenon reflects the affected area of the brain. In 
generalized seizures, the discharge bilaterally and 
diffusely involves the entire cortex. Sometimes a focal 
lesion of one part of a hemisphere activates the entire 
cerebrum bilaterally so rapidly that it produces a 

15 generalized tonic-clonic seizure before a focal sign 
appears . 

Epilepsy is a family of disorders. Those that 
are idiopathic are believed to have multiple genetic 
contributions. For example, idiopathic generalized 

20 epilepsy (IGE) is characterized by recurring 

generalized seizures in the absence of detectable brain 
lesions and/or metabolic abnormalities. Twin and family 
studies suggest that genetic factors play a key part in its 
etiology. Although a mutation in the CACNB4 gene can cause 

25 the disorder, linkage to 8q24, Zara et al., Hum. Molec. 
Genet. 4: 1201-1207(1995), 3q26 and 14q23, Sander et al., 
Hum. Molec. Genet. 9:1465-1472 (2000), and 2q36 has been 
also demonstrated, with a multilocus model appearing to fit 
best the observed familial patterns. 

30 Polygenic contributions to the etiology of 

various neurologic cancers have similarly been described. 

For example, gliomas account for 45% of 
intracranial tumors, and multiple loci have been implicated 
in its development, with losses of chromosome 17p, increase 

35 in copy number of chromosome 7, structural abnormalities of 
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chromosomes 9p and 19q, and genes on chromosome 10 among 
the suspects. 

Other significant diseases of brain and nervous 
tissue are also believed to have a genetic, typically 
5 polygenic, etiologic component. These diseases include, for 
example, Parkinson's disease, dementia with Lewy 
bodies, f rontotemporal dementia, corticobasal ganglionic 
degeneration, progressive supranuclear palsy, prion 
diseases (Creutzf eld- Jakob, Gerstmann-Strausller-Shenker, 

10 familial fatal insomnia), Tourette's Syndrome, corticobasal 
degeneration, multiple system atrophy, striatonigral 
degeneration, Shy-Drager syndrome, olivopontocerebellar 
. atrophy, spinocerebellar ataxia, Friedreich a,taxia, ataxia- 
telangiectasia, amyotrophic lateral sclerosis, bulbospinal 

15 atrophy (Kennedy's syndrome), spinal muscular atrophy, 
neuronal storage diseases (sphingolipid, 
mucopolysaccharide, mucolipid) , leukodystrophy, Krabbe 
disease, metachromic leukodystrophy, adrenoleukodystrophy, 
Pelizaeus-Merzbacher disease, Canavan disease, 

20 mitochondrial encephalomyopathy , Leigh disease, 

neurofibromatosis (Type 1 and Type II) , tuberous sclerosis, 
paraneoplastic syndrome, subacute cerebellar degeneration, 
subacute sensory neuropathy, opsoclonus/myoclonus, retinal 
degeneration, stiff-man-syndrome and Von Hippel-Lindau 

25 disease. 

Many neurologic cancers other than gliomas have 
also been shown or suspected to have genetic bases or 
contributions. Among these cancers are astrocytoma, 
fibrillary astrocytoma, pilocytic astrocytoma, 

30 pleomorphic xanthoastrocytoma, oligodendroglioma, 

ependymoma, gangliocytoma, ganglioglioma, medulloblastoma, 
primary brain germ cell tumor, pineocytoma, pineoblastoma, 
and meningioma. 

Other disorders of brain and central nervous 

35 system that likely have genetic components include the 
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various forms of neural deafness, catatonia, depression, 
bipolar (manic-depressive) disorder, Wilson's Disease, Pick 
disease, neuromyelitis optica (Devic disease), central 
pontine myelinolysis, Marchiaf ava-Bignami disease, 

5 Guillain-Barre syndrome, sleep disorders (insomnia, 

myoclonus, narcolepsy, cataplexy, sleep apnea) , amnesia, 
aphasias (including Broca ' s aphasia and Wernicke's 
aphasia) , cortical blindness, visual agnosia, auditory 
agnosia, and Kluver-Bucy syndrome. 

10 The human genome-derived single exon nucleic acid 

probes and microarrays of the present invention are useful 
for predicting, diagnosing, grading^ staging, monitoring 
and prognosing diseases of human brain, particularly those 
diseases with polygenic etiology. With each of the single 

15 exon probes described herein shown to be expressed at 

detectable levels in human brain, and with about 2/3 of the 
probes identifying novel genes, the single exon microarrays 
of the present invention provide exceptionally high 
informational content for such studies. 

20 For example, diagnosis (including differential 

diagnosis among clinically indistinguishable disorders), 
staging, and/or grading of a disease can be based upon the 
quantitative relatedness of a patient gene expression 
profile to one or more reference expression profiles known 

25 to be characteristic of a given neurologic disease, or to 
specific grades or stages thereof. 

In one embodiment, the. patient gene expression 
profile is generated by hybridizing nucleic acids obtained 
directly or indirectly from transcripts expressed in the 

30 patient's brain (or other CNS tissues, including cultured 
tissues) to the genome-derived single exon microarray of 
the present invention. Reference profiles are be obtained 
similarly by hybridizing nucleic acids from individuals 
with known disease. Methods for quantitatively relating 

35 gene expression profiles, without regard to the function of 
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the protein encoded by the gene, are disclosed in WO 
99/58720, incorporated herein by reference in its entirety. 

In another approach, the genome-derived single 
exon probes and microarrays of the present invention can be 
5 used to interrogate genomic DNA, rather than pools of 
expressed message; this latter approach permits 
predisposition to and/or prognosis of neurologic disease to 
be assessed through the massively parallel determination of 
altered copy number, deletion, or mutation in the patient's 

10 genome of exons known to be expressed in human brain. The 
algorithms set forth in WO 99/58720 can be applied to such 
genomic profiles without regard to the function of the 
protein encoded by the interrogated gene. 

The utility is specific to the probe; at 

15 sufficiently high hybridization stringency, which 

stringencies are well known in the art — see Ausubel et al. 
and Maniatis et al. - each probe reports the level of 
expression of message specifically containing that ORF. 

It should be appreciated, however, that the 

20 probes of the present invention, for which expression in 
the brain has been demonstrated are useful for both 
measurement in the brain and for survey of expression in 
other tissues-. 

Significant among such advantages is the presence 

25 of probes for novel genes. 

As mentioned above and further detailed in 
Examples 1 and 2, the methods described enable ORFs which 
are not present in existing expression databases to be 
identified. And the fewer the number of tissues in which 

30 the ORF can be shown to be expressed, the more likely the 
ORF will prove to be part of a novel gene: as further 
discussed in Example 2, ORFs whose expression was 
measurable in only a single of the tested tissues were 
represented in existing expression databases at a rate of 

35 only 11%, whereas 36% of ORFs whose expression was 
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measurable in 9 tissues were present in existing expression 
databases, and fully 45% of those ORFs expressed in all ten 
tested tissues were present in existing expressed sequence 
databases. 

5 Either as tools for measuring gene expression or 

tools for surveying gene expression, the genome-derived 
single exon probes of the present invention have 
significant advantages over the cDNA or EST-based probes 
that are currently available for achieving these utilities. 

10 The genome-derived single exon probes of the 

present invention are useful in constructing genome-derived 
single exon microarrays; the genome-derived single exon 
microarrays, in turn, are useful devices for measuring and 
for surveying gene expression in the human. 

15 Gene expression analysis using microarrays - 

conventionally using microarrays having probes derived from 
expressed message — is well-established as useful in the 
biological research arts (see Lockhart et al. Nature 405, 
827-836) . 

20 Microarrays have been used to determine gene 

expression profiles in cells in response to drug treatment 
(see, for example, Kaminski et al., "Global Analysis of 
Gene Expression in Pulmonary Fibrosis Reveals Distinct 
Programs Regulating Lung Inflammation and Fibrosis," Proc. 

25 Natl. Acad. Sci. USA 97 (4 ): 1778-83 (2000); Bartosiewicz et 
al. r "Development of a Toxicological Gene Array and 
Quantitative Assessment of This Technology," Arch. Biochem. 
Biophys. 376(1): 66-73 (2000)), viral infection (see for 
example, Geiss et al., "Large-scale Monitoring of Host Cell 

30 Gene Expression During HIV-1 Infection Using cDNA 

Microarrays," Virology 266(1) :8-16 (2000)) and during cell 
processes such as differentiation, senescence and apoptosis 
(see, for example, Shelton et al . , "Microarray Analysis of 
Replicative Senescence," Curr. Biol. 9(17): 939-45 (1999); 

35 Voehringer et al., "Gene Microarray Identification of Redox 
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and Mitochondrial Elements That Control Resistance or 
Sensitivity to Apoptosis, " Proc. Natl. Acad. Sci. USA 
97 (6) :2680-5 (2000) ) . 

Microarrays have also been used to determine 
5 abnormal gene expression in diseased tissues (see, for 
example, Alon et al., "Broad Patterns of Gene Expression 
Revealed by Clustering Analysis of Tumor and Normal Colon 
Tissues Probed by Oligonucleotide Arrays," Proc. Natl. 
Acad. Sci. USA 96 ( 12 ): 6745-50 (1999); Perou et al., 

10 "Distinctive Gene Expression Patterns in Human Mammary 

Epithelial Cells and Breast Cancers, Proc. Natl. Acad. Sci. 
USA 96(16) : 9212-7 (1999); Wang et al. , "Identification of 
Genes Differentially Over-expressed in Lung Squamous Cell 
Carcinoma Using Combination of cDNA Subtraction and 

15 Microarray Analysis," Oncogene 19 (12 ): 1519-28 (2000); 

Whitney et al., "Analysis of Gene Expression in Multiple 
Sclerosis Lesions Using cDNA Microarrays," Ann. Neurol. 
46(3): 425-8 (1999)), in drug discovery screens (see, for 
example, Scherf et al., "A Gene Expression Database for the 

20 Molecular Pharmacology of Cancer," Nat. Genet. 24(3):236-44 
(2000) ) and in diagnosis to determine appropriate treatment 
strategies (see, for example, Sgroi et al., "In vivo Gene 
Expression Profile Analysis of Human Breast Cancer 
Progression," Cancer Res. 59 (22) : 5656-61 (1999)). 

25 In microarray-based gene expression screens of 

pharmacological drug candidates upon cells, each probe 
provides specific useful data. In particular, it should be 
appreciated that even those probes that show no change in 
expression are as informative as those that do change, 

30 serving, in essence, as negative controls. 

For example, where gene expression analysis is 
used to assess toxicity of chemical agents on cells, the 
failure of the agent to change a gene's expression level is 
evidence that the drug likely does not affect the pathway 

35 of which the gene's expressed protein is a part. 
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Analogously, where gene expression analysis is used to 
assess side effects of pharmacological agents - whether in 
lead compound discovery or in subsequent screening of lead 
compound derivatives — the inability of the agent to alter 
5 a gene's expression level is evidence that the drug does 
not affect the pathway of which the gene's expressed 
protein is a part. 

WO 99/58720 provides methods for quantifying the 
relatedness of a first and second gene expression profile 

10 and for ordering the relatedness of a plurality of gene 
expression profiles. The methods so described permit 
useful information to be extracted from a greater 
percentage of the individual gene expression measurements 
from a microarray than methods previously used in the art. 

15 Other uses of microarrays are described in 

Gerhold et al., Trends Biochem. Sci. 24 { 5) : 168-173 (1999) 
and Zweiger, Trends Biotechnol. 17 { 11 ): 429-4 36 (1999); 
Schena et al. 

The invention particularly provides genome- 

20 derived single-exon probes known to be expressed in brain. 

The individual single exon probes can be provided 
in the form of substantially isolated and purified nucleic 
acid, typically, but not necessarily, in a quantity 
sufficient to perform a hybridization reaction. 

25 Such nucleic acid can be in any form directly 

hybridizable to the message that contains the probe's ORF, 
such as double stranded DNA, single-stranded DNA 
complementary to the message, single-stranded RNA 
complementary to the message, or chimeric DNA/RNA molecules 

30 so hybridizable. The nucleic acid can alternatively or 
additionally include either nonnative nucleotides, 
alternative internucleotide linkages, or both, so long as 
complementary binding can be obtained. For example, probes 
can include phosphorothioates, methylphosphonates, 

35 morpholino analogs, and peptide nucleic acids (PNA), as are 
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described, for example,— in U.S. Patent Nos . 5,142,047; 
5,235,033; 5,166,315; 5,217,866; 5,184,444; 5,861,250. . 

Usefully, however, such probes are provided in a 
form and quantity suitable for amplification, where the 
5 amplified product is thereafter to be used in the 
hybridization reactions that probe gene expression. 
Typically, such probes are provided in a form and quantity 
suitable for amplification by PCR or by other well known 
amplification technique. One such' technique additional to 

10 PCR is rolling circle amplification, as is described, inter 
alia, in U.S. Patent Nos. 5,854,033 and 5,714,320 and 
international patent publications WO 97/19193 and 
WO 00/15779. As is well understood, where the probes are 
to be provided in a form suitable for amplification, the 

15 range of nucleic acid analogues and/or internucleotide 

linkages will be constrained by the requirements and nature 
of the amplification enzyme. 

Where the probe is to be provided in form 
suitable for amplification, the quantity need not be 

20 sufficient for direct hybridization for gene expression 
analysis, and need be sufficient only to function as an 
amplification template, typically at least about 1, 10 or 
100 pg or more. 

Each discrete amplifiable probe can also be 

25 packaged with amplification primers, either in a single 
composition that comprises probe template and primers, or 
in a kit that comprises such primers separately packaged 
therefrom. As earlier mentioned, the ORF-specif ic 
5' primers used for genomic amplification can have a first 

30 common sequence added thereto, and the ORF-specif ic 3' 
primers used for genomic amplification can have a second, 
different, common sequence added thereto, thus permitting, 
in this embodiment, the use of a single set of 5' and 3* 
primers to amplify any one of the probes. The probe 

35 composition and/or kit can also include buffers, enzyme, 
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etc., required -to effect amplification. 

As mentioned earlier, when intended for use on a 
genome-derived single exon microarray of the present 
invention, the genome-derived single exon probes of the 
5 present invention will typically average at least about 
100, 200, 300, 400 or 500 bp in length, including (and 
typically, but not necessarily centered about) the ORF. 
Furthermore, when intended for use on a genome-derived 
single exon microarray of the present invention, the 

10 genome-derived single exon probes of the present invention 
will typically not contain a detectable label. 

When intended for use in solution phase 
hybridization, however — that is, for use in a 
hybridization reaction in which the probe is not first 

15 bound to a support substrate (although the target may 

indeed be so bound) — length constraints that are imposed 
in microarray-based hybridization approaches will be 
relaxed, and such probes will typically be labeled. 

In such case, the only functional constraint that 

20 dictates the minimum size of such probe is that each such 
probe must be capable of specifically identifying in a 
hybridization reaction the exon from which it is drawn. In 
theory., a probe of as little as 17 nucleotides is capable 
of uniquely identifying its cognate sequence in the human 

25 genome. For hybridization to expressed message - a subset 
of target sequence that is much reduced in complexity as 
compared to genomic sequence — even fewer nucleotides are 
required for specificity. 

Therefore, the probes of the present invention 

30 can include as few as 20, 25 or 50 bp or ORF, or more. In 
particular embodiments, the ORF sequences are given in SEQ 
ID NOS. 12,822 - 25,434, respectively, for probe SEQ ID 
NOS. 1 - 12,821. The minimum amount of ORF required to be 
included in the probe of the present invention in order to 

35 provide specific signal in either solution phase or 
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microarray-based hybridizations can readily be determined 
for each of ORF SEQ ID NOS . 12,822 - 25,434 individually by 
routine experimentation using standard high stringency 
conditions. 

5 Such high stringency conditions are described, 

inter alia, in Ausubel et al. and Maniatis et al. For 
microarray-based hybridization, standard high stringency 
conditions can usefully be 50% formamide, 5X SSC, 0.2 ug/ul 
poly{dA), 0.2 ug/pl human c 0 tl DNA, and 0.5 % SDS, in a 

10 humid oven at 42°C overnight, followed by successive washes 
of the microarray in IX SSC, 0.2% SDS at 55°C for 5 
minutes, and then 0 . IX SSC, 0.2% SDS, at 55°C for 20 
minutes. For solution phase hybridization, standard high 
stringency conditions can usefully be aqueous hybridization 

15 at 65°C in 6X SSC. Lower stringency conditions, suitable 
for cross-hybridization to mRNA encoding structurally- and 
functionally-related proteins, can usefully be the same as 
the high stringency conditions but with reduction in 
temperature for hybridization and washing to room 

20 temperature {approximately 25°C) . 

When intended for use in solution phase 
hybridization, the maximum size of the single exon probes 
of the present invention is dictated by the proximity of 
other expressed exons in genomic DNA: although each single 

25 exon probe can include intergenic and/or -intronic material 
contiguous to the ORF in the human genome, each probe of 
the present invention will include portions of only one 
expressed exon. 

Thus, each single exon probe will include no more 

30 than about 25 kb of contiguous genomic sequence, more 

typically no more than about 20 kb of contiguous genomic 
sequence, more usually no more than about 15 kb, even more 
usually no more than about 10 kb. Usually, probes that are 
maximally about 5 kb will be used, more typically no more 

35 than about 3 kb. 
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It will be appreciated that the Sequence Listing 
appended hereto presents, by convention, only that strand 
of the probe and ORF sequence that can be directly 
translated reading from 5' to 3' end. As would be well 
5 understood by one of skill in the art, single stranded 
probes must be complementary in sequence to the ORF as 
present in an mRNA; it is well within the skill in the art 
to determine such complementary sequence. It will further 
be understood that double stranded probes can be used in 

10 both solution-phase hybridization and microarray-based 
hybridization if suitably denatured. 

Thus, it is an aspect of the present invention to 
provide single-stranded nucleic acid probes that have 
sequence complementary to those described herein above and 

15 below, and double-stranded probes one strand of which has 
sequence complementary to the probes described herein. 

The probes can, but need not, contain intergenic 
and/or intronic material that flanks the ORF, on one or 
both sides, in the same linear relationship to the ORF that 

20 the intergenic and/or intronic material bears to the ORF in 
genomic DNA. The probes do not, however, contain nucleic 
acid derived from more than one expressed ORF. 

And when intended for use in solution 
hybridization, the probes of the present invention can 

25 usefully have detectable labels. Nucleic acid labels are 
well known in the art, and include, inter alia, radioactive 
labels, such as 3 H, 32 P, 33 P, 35 S, 125 I, 131 I; fluorescent 
labels, such as Cy3, Cy5, Cy5.5, Cy7, SYBR® 

Green and other labels described in Haugland, 

30 Handbook of Fluorescent Probes and Research Chemicals, 7th 
ed., Molecular Probes Inc., Eugene, OR (2000), or 
fluorescence resonance energy transfer tandem conjugates 
thereof; labels suitable for chemiluminescent and/or 
enhanced chemiluminescent detection; labels suitable for 

35 ESR and NMR detection; and labels that include one member 
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of a specific binding pair, such as biotin, digoxigenin, or 
the like. 

•The probes, either in quantity sufficient for 
hybridization or sufficient for amplification, can be 
5 provided in individual vials or containers. 

Alternatively, such probes can usefully be 
packaged as a plurality of such individual genome-derived 
"s ingle exon probes. 

When provided as a collection of plural 
10 individual probes, the probes are typically made available 
in amplifiable form in a spatially-addressable ordered set, 
typically one per well of a microtiter dish. Although a 96 
well microtiter plate can be used, greater efficiency is 
obtained using higher density arrays. 
15 If, as earlier mentioned, the ORF-specific 

5' primers used for genomic amplification had a first 
common sequence added thereto, and the ORF-specific 3 f 
primers used for genomic amplification had a second, 
different, common sequence added thereto, a single set of 
20 5' and 3' primers can be used to amplify all of the probes 
from the amplifiable ordered set. 

Such collections of genome-derived single exon 
probes can usefully include a plurality of probes chosen 
for the common attribute of expression in the human brain. 
..,25 In such defined subsets, typically at least 50, 

'60, 75/ 80, 85, 90 or 95% or more of the probes will be 
chosen by their expression in the defined tissue or cell 
type. 

The single exon probes of the present invention, 
30 as well as fragments of the single exon probes comprising 
selectively hybridizable portions of the probe ORF, can be 
used to obtain the full length cDNA that includes the ORF 
by (i) screening of cDNA libraries; (ii) rapid 
amplification of cDNA ends ("RACE") ; or (iii) other 
35 conventional means, as are described, inter alia, in 
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Ausubel et al. and Maniatis et al. 

It is another aspect of the present invention to 
provide genome-derived single exon nucleic acid microarrays 
useful for gene expression analysis, where the term 
5 "microarray" has the meaning given in the definitional 
section of this description, supra. 

The invention particularly provides genome- 
derived single-exon nucleic acid microarrays comprising ■a' " 
plurality of probes known to be expressed in human brain. 

10 In preferred embodiments, the present invention provides 
human genome-derived single exon microarrays comprising a 
plurality of probes drawn from the group consisting of SEQ 
ID NOS. : 1 - 12,821. 

When used for gene expression analysis, the 

15 genome-derived single exon microarrays provide greater 
physical informational density than do the genome-derived 
single exon microarrays that have lower percentages of 
probes known to be expressed commonly in the tested tissue. 
At a fixed probe density, for example, a given microarray 

20 surface area of the defined subset genome-derived single 
exon microarray can yield a greater number of expression 
measurements. Alternatively, at a given probe density, the 
same number of expression measurements can be obtained from 
a smaller substrate surface area. Alternatively, at a 

25 fixed probe density and fixed surface area, probes can be 
provided redundantly, providing greater reliability in 
signal measurement for any given probe. Furthermore, with 
a higher percentage of probes known to be expressed in the 
assayed tissue, the dynamic range of the detection means 

30 can be adjusted to reveal finer levels discrimination among 
the levels of expression. 

Although particularly described with respect to 
their utility as probes of gene expression, particularly as 
probes to be included on a genome-derived single exon 

35 microarray, each of the nucleic acids having SEQ ID NOS.: 1 
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- 12,821 contains an open-reading frame, set forth 
respectively in SEQ ID NOS.; 12,822 - 25,434, that encodes 
a protein domain. Thus, each of SEQ ID NOS. 1 - 12,821 can 
be used, or that portion thereof in SEQ ID NOS. 12,822 - 
5 25,434 used, to express a protein domain by standard in 
vitro recombinant techniques. See Ausubel et al. and 
Maniatis et al. 

Additionally, kits are available commercially 
that readily permit such nucleic acids to be expressed as 

10 protein in bacterial cells, insect cells, or mammalian 
cells, as desired {e.g., HAT™ Protein Expression & 
Purification System, ClonTech Laboratories, Palo Alto, CA; 
Adeno-X™ Expression System, ClonTech Laboratories, Palo 
Alto, CA; Protein Fusion & Purification (pMAL™) System, New 

15 England Biolabs, Beverley, MA) 

Furthermore, shorter peptides can be chemically 
synthesized using commercial peptide synthesizing equipment 
and well known techniques. Procedures are described, inter 
alia, in Chan et al. (eds.), Fmoc Solid Phase Peptide 

20 Synthesis: A Practical Approach (Practical Approach Series, 
(Paper)), Oxford Univ. Press (March 2000) (ISBN: 
0199637245); Jones, Amino Acid and Peptide Synthesis 
(Oxford Chemistry Primers, No 7) , Oxford Univ. Press 
(August 1992) (ISBN: 0198556683); and Bodanszky, Principles 

25 of Peptide Synthesis (Springer Laboratory), Springer Verlag 
(December 1993) (ISBN: 0387564314). 

It is, therefore, another aspect of the invention 
to provide peptides comprising an amino acid sequence 
translated from SEQ ID NOS.: 12,822 - 25,434. Such amino 

30 acid sequences are set out in SEQ ID NOS: 25,435 - 37,811. 
Any such recombinantly-expressed or synthesized peptide of 
at least 8, and preferably at least about 15, amino acids, 
can be conjugated to a carrier protein and used to generate 
antibody that recognizes the peptide. Thus, it is a 

35 further aspect of the invention to provide peptides that 
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have at least 8, preferably at least 15, consecutive amino 
acids . 



The following examples are offered by way of 
5 illustration and not by way of limitation. 

EXAMPLE 1 

Preparation of Single Exon Microarrays from ORFs Predicted 
in Human Genomic Sequence 

10 

Bioinf ormatics Results 

All human BAC sequences in fewer than 10 pieces 
that had been accessioned in a five month period 
immediately preceding this study were downloaded from 

15 GenBank. This corresponds to -2200 clones, totaling -350 
MB of sequence, or approximately 10% of the human genome. 

After masking repetitive elements using the 
program CROSS_MATCH, the sequence was analyzed for open 
reading frames using three separate gene finding programs. 

20 The three programs predict genes using independent 

algorithmic methods developed on independent training sets 
GRAIL uses a neural network, GENEFINDER uses a hidden 
Markoff model, and DICTION, a program proprietary to 
Genetics Institute, operates according to a different 

25 heuristic. The results of all three programs were used to 
create a prediction matrix across the segment of genomic 
DNA. 

The three gene finding programs yielded a range 
of results. GRAIL identified the greatest percentage of 
30 genomic sequence as putative coding region, 2% of the data 
analyzed. GENEFINDER was second, calling 1%, and DICTION 
yielded the least putative coding region, with 0.8% of 
genomic sequence called as coding region. 

The consensus data were as follows. GRAIL and 
35 GENEFINDER agreed on 0.7% of genomic sequence, GRAIL and 
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DICTION agreed on 0.5% of genomic sequence, and the three 
programs together agreed on 0.25% of the data analyzed. 
That is, 0.25% of the genomic sequence was identified by 
all three of the programs as containing putative coding 
5 region . 

ORFs predicted by any two of the three programs 
("consensus ORFs" ) were assorted into "gene bins" using, two 
criteria: (1) any 7 consecutive exons within a 25 kb window 
were placed together in a bin as likely contributing to a 
10 single gene, and (2) all ORFs within a 25 kb window were 
placed together in a bin as likely contributing to a single 
gene if fewer than 7 exons were found within the 25 kb 
window . 

15 PCR 

The largest ORF from each gene bin that did not 
span repetitive sequence was then chosen for amplification, 
as were all consensus ORFs longer than 500 bp. This method 
approximated one exon per gene; however, a number of genes 

20 were found to be represented by multiple elements. 

Previously, we had determined that DNA fragments 
fewer than 250 bp in length do not bind well to the amino- 
modified glass surface of the slides used as support 
substrate for construction of microarrays; therefore, 

25 amplicons were designed in the present experiments to 
approximate 500 bp in length. 

Accordingly, after selecting the largest ORF per 
gene bin, a 500 bp fragment of sequence centered on the ORF 
was passed to the primer picking software, PRIMER3 

30 (available online for use at 

http://www-genome.wi.mit.edu/cgi-bin/primer/ ). A first 
additional sequence was commonly added to each ORF-unique 
5' primer, and a second, different, additional sequence was 
commonly added to each ORF-unique 3' primer, to permit 

35 subsequent reamplif ication of the amplicon using a single 
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set of "universal" 5 ? and 3' primers, thus immortalizing 
the amplicon. The addition of universal priming sequences 
also facilitates sequence verification, and can be used to 
add a cloning site should some ORFs be found to warrant 

5 further study. 

The ORFs were then PCR amplified from genomic 
DNA, verified on agarose gels, and sequenced using the 
universal primers to validate the identity of the amplicon 
to be spotted in the microarray. 

10 Primers were supplied by Operon Technologies 

(Alameda, CA) . PCR amplification was performed by standard 
techniques using human genomic DNA (Clontech, Palo Alto, 
CA) as template. Each PCR product was verified by SYBR® 
green (Molecular Probes, Inc., Eugene, OR) staining of 

15 agarose gels, with subsequent imaging by Fluorimager 
(Molecular Dynamics, Inc., Sunnyvale, CA) . PCR 
amplification was classified as successful if a single band 
appeared. 

The success rate for amplifying ORFs of interest 

20 directly from genomic DNA using PCR was approximately 75%. 
FIG . 5 graphs the distribution of predicted ORF (exon) 
length and distribution of amplified PCR products, with ORF 
length shown in red and PCR product length shown in blue 
(which may appear black in the figure) . Although the range 

25 of ORF sizes is readily seen to extend to beyond 900 bp, 

the mean predicted exon size was only 22 9 bp, with a median 
size of 150 bp (n=9498) . With an average amplicon size of 
475 ± 25 bp, approximately 50% of the average PCR 
amplification product contained predicted coding region, 

30 with the remaining 50% of the amplicon containing either 
intron, intergenic sequence, or both. 

Using a strategy predicated on amplifying about 
500 bp, it was found that long exons had a higher PCR 
failure rate. To address this, the bioinf ormatics process 

35 was adjusted to amplify 1000, 1500 or 2000 bp fragments 
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from exons larger than 500 bp. This, improved the rate of 
successful amplification of exons exceeding 500 bp, 
constituting about 9.2% of the exons predicted by the gene 
finding algorithms. 

5 Approximately 75% of the probes disposed on the 

array (90% of those that successfully PCR amplified) were 
sequence-verified by sequencing in both the forward and 
reverse direction using MegaBACE sequencer (Molecular 
Dynamics, Inc., Sunnyvale, CA} , universal primers, and 

10 standard protocols. 

Some genomic clones (BACs) yielded very poor PCR 
and sequencing results. The reasons for this are unclear, 
but may be related to the quality of early draft sequence 
or the inclusion of vector and host contamination in some 

15 submitted sequence data. 

Although the intronic and intergenic material 
flanking coding regions could theoretically interfere with 
hybridization during microarray experiments, subsequent 
empirical results demonstrated that differential expression 

20 ratios were not significantly affected by the presence of 
noncoding sequence. The variation in exon size was 
similarly found not to affect differential expression 
ratios significantly; however, variation in exon size was 
observed to affect the absolute signal intensity (data not 

25 shown) . 

The 350 MB of genomic DNA was, by the above- 
described process, reduced to 9750 discrete probes, which 
were spotted in duplicate onto glass slides using 
commercially available instrumentation (MicroArray Genii 

30 Spotter and/or MicroArray Genii I Spotter, .Molecular 

Dynamics, Inc., Sunnyvale, CA) . Each slide additionally 
included either 16 or 32 E. coli genes, the average 
hybridization signal of which was used as a measure of 
background biological noise. 

35 Each of the probe sequences was BLASTed against 
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the human EST data set, the NR data set, and SwissProt 
GenBank (May 7, 1999 release 2.0.9). 

One third of the probe sequences {as amplified) 
produced an exact match (BLAST Expect ( "E" ) values less 
5 than 1 e" 100 ) to either an EST (20% of sequences) or a known 
mRNA (13% of sequences) . A further 22% of the probe 
sequences showed some homology to a known EST or mRNA 
(BLAST E values from 1 e~ 5 to 1 e~") . The remaining 45% of 
the probe sequences showed no significant sequence homology 
10 to any expressed, or potentially expressed, sequences 
present in public databases. 

All of the probe sequences (as amplified) were 
then analyzed for protein similarities with the SwissProt 
database using BLASTX, Gish et al . , Nature Genet. 3:266 
15 (1993) . The predicted functional breakdowns of the 2/3 of 
probes identical or homologous to known sequences are 
presented in Table 1. 



Table 1 



Function of Predicted ORFs As 
Sequence Analysis 


Deduced From Comparative 


Total 


V6 chip 


V7 chip 


Function Predicted from 
Comparative Sequence 
Analysis 


211 


96 


115 


Receptor 


120 


43 


77 


Zinc Finger 


30 


11 


19 


Homeobox 


25 


9 


16 


Transcription Factor 


17 


11 


7 


Transcription 


118 


57 


61 


Structural 


95 


39 


56 


Kinase 


36 


18 


18 


Phosphatase 


83 


31 


52 


Ribosomal 
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45 


19 


26 


Transport 


21 


17 


14 


Growth Factor 


17 


12 


5 


Cytochrome 


50 


33 


17 


Channel 



As can be seen, the two most common types of 
genes were transcription factors and receptors, making up 
2.2% and 1.8% of the arrayed elements, respectively, 

5 

EXAMPLE 2 

Gene Expression Measurements From Genome-Derived Single 
Exon Microarrays 



The two genome-derived single exon microarrays 
prepared according to Example 1 were hybridized in a series 
of simultaneous two-color fluorescence experiments to (1) 
Cy3-labeled cDNA synthesized from message drawn 
individually from each of brain, heart, liver, fetal liver, 
placenta, lung, bone marrow, HeLa, BT 474, or HBL 100 
cells, and (2) Cy5-labeled cDNA prepared from message 
pooled from all ten tissues and cell types, as a control in 
each of the measurements. Hybridization and scanning were 
carried out using standard protocols and Molecular Dynamics 
equipment . 

Briefly, mRNA samples were bought from commercial 
sources (Clontech, Palo Alto, CA and Amersham Pharmacia 
Biotech (APB) ) . Cy3-dCTP and Cy5-dCTP (both from APB) were 
incorporated during separate reverse transcriptions of 1 ug 
of polyA + mRNA performed using 1 ug oligo (dT) 12-18 primer 
and 2 ug random 9mer primers as follows. After heating to 
7 0°C, the RNA: primer mixture was snap cooled on ice. After 
snap cooling on ice, added to the RNA to the stated final 
concentration was: IX Superscript II buffer, 0.01 M DTT , 
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lOOpM dATP, 100 pM dGTP, 100 pM dTTP, 50 pM dCTP, 50 pM 
Cy3-dCTP or Cy5-dCTP 50 pM, and 200 U Superscript II 
enzyme. The reaction was incubated for 2 hours at 42°C. 
After 2 hours, the first strand cDNA was. isolated by adding 

5 1 U Ribonuclease H, and incubating for 30 minutes at 37°C. 
The reaction was then purified using a Qiagen PCR cleanup 
column, increasing the number of ethanol washes to 5. 
Probe was eluted using 10 mM Tris pH 8.5. 

Using a spectrophotometer, probes were measured 

10 for dye incorporation. Volumes of both Cy3 and Cy5 cDNA 
corresponding to 50 pmoles of each dye were then dried in a 
Speedvac, resuspended in 30 pi hybridization solution 
containing 50% formamide, 5X SSC, 0.2 pg/pl poly(dA), 0.2 
pg/pl human c c tl DNA, and 0.5 % SDS. 

15 Hybridizations were carried out under a 

coverslip, with the array placed in a humid oven at 42°C 
overnight. Before scanning, slides were washed in IX SSC, 
0.2% SDS at 55°C for 5 minutes, followed by 0 . IX SSC, 0.2% 
SDS, at 55°C for 20 minutes. Slides were briefly dipped in 

20 water and dried thoroughly under a gentle stream of 
nitrogen. 

Slides were scanned using a Molecular Dynamics 
Gen3 scanner, as described. Schena (ed.), Microarray 
Biochip: Tools and Technology , Eaton Publishing 

25 Company/BioTechniques Books Division (2000) (ISBN: 
1881299376) . 

Although the use of pooled cDNA as a reference 
permitted the survey of a large number of tissues, it 
attenuates the measurement of relative gene expression, 

30 since every highly expressed gene in the tissue/cell type- 
specific fluorescence channel will be present to a level of 
at least 10% in the control channel. Because of this fact, 
both signal and expression ratios (the latter hereinafter, 
"expression" or "relative expression") for each probe were 

35 normalized using the average ratio or average signal, 
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respectively, as measured across the whole slide. 

Data were accepted for further analysis only when 
signal was at least three times greater than biological 
noise, the latter defined by the average signal produced by 
5 the E. coli control genes. 

The relative expression signal for these probes 
was then plotted as function of tissue or cell type, and is 
presented in FIG. 6. 
m ..\. FIG. 6 shows the distribution of expression 

10 across a panel of ten tissues. The graph shows the number 
of sequence-verified products that were either not 
expressed ("0"), expressed in one or more but not all 
tested tissues ("1" - "9"), and expressed in all tissues 
tested ("10") . 

15 Of 9999 arrayed elements on the two microarrays 

(including positive and negative controls and "failed" 
products), 2353 (51%) were expressed in at least one tissue 
or cell type. Of the gene elements showing significant 
signal — where expression was scored as "significant" if 

20 the normalized Cy3 signal was greater than 1, representing 
signal 5-fold over biological noise (0.2) - 39% (991) were 
expressed in all 10 tissues. The next most common class 
(15%) consisted of gene elements expressed in only a single 
tissue . 

25 The genes expressed in a single tissue were 

further analyzed, and the results of the analyses are 
compiled in FIG. 7. 

FIG. 7A is a matrix presenting the expression of 
all verified sequences that showed expression greater than 

30 3 in at least one tissue. Each clone is represented by a 
column in the matrix. Each of the 10 tissues assayed is 
represented by a separate row in the matrix, and relative 
expression of a clone in- that tissue is indicated at the 
respective node by intensity of green shading, with the 

35 intensity legend shown in panel B. The top row of the 
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matrix ("EST Hit") contains "bioinf ormatic" rather than 
"physical" expression data — that is, presents the results 
returned by query of EST, NR and SwissProt databases using 
the probe sequence. The legend for "bioinf ormatic 
5 expression" (i.e., degree of homology returned) is 

presented in panel C. Briefly, white is known, black is 
novel, with gray depicting nonidentical with significant 
homology (white: E values < le-100; gray: E values from le- 
05 to le-99; black: E values > le-05) . 

10 As FIG . 7 readily shows, heart and brain were 

demonstrated to have the greatest numbers of genes that 
were shown to be uniquely expressed in the respective 
tissue. In brain, 200 uniquely expressed genes were 
identified; in heart, 150. The remaining tissues gave the 

15 following figures for uniquely expressed genes: liver, 100; 
lung, 70; fetal liver, 150; bone marrow, 75; placenta, 100; 
HeLa, 50; HBL, 100; and BT474, 50. 

It was further observed that there were many more 
"novel" genes, among those that were up-regulated in only 

20 one tissue, as compared with those that were down-regulated 
in only one tissue. In fact, it was found that ORFs whose 
expression was measurable in only a single of the tested 
tissues were represented in sequencing databases at. a rate 
of only 11%, whereas 36% of the ORFs whose expression was 

25 measurable in 9 of the tissues were present in public 

databases. As for those ORFs expressed in all ten tissues, 
fully 45% were present in existing expressed sequence 
databases. These results are not unexpected, since genes 
expressed in a greater number of tissues have a higher 

30 likelihood of being, and thus of having been, discovered by 
EST approaches. 



Comparison of Signal from Known and Unknown Genes 

The normalized signal of the genes found to have 
35 high homology to genes present in the GenBank human EST 
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database were compared to the normalized signal of those 
genes not found in the GenBank human EST database. The 
data are shown in FIG. 8. 

FIG. 8 shows the normalized Cy3 signal intensity 

5 for all sequence-verified products with a BLAST Expect 
( "E" ) value of greater than le-30 (designated "unknown") 
upon query of existing EST, NR and SwissProt databases, and 
shows in blue the normalized Cy3 signal intensity for all 
sequence-verified products with a BLAST Expect .value of 

10 less than le-30 ("known") . Note that biological background 
noise has an averaged normalized Cy3 signal intensity of - 
0.2. 

As expected, the most highly expressed* of the 
ORFs were "known" genes. This is not surprising, since 

15 very high signal intensity correlates with very commonly- 
expressed genes, which have a higher likelihood of being 
found by EST sequence. 

However, a significant point is that a large 
number of even the high expressers were "unknown". Since 

20 the genomic approach used to identify genes and to confirm 
their expression does not bias exons toward either the 3' 
or 5' end of a gene, many of these high expression genes 
will not have been detected in an end-sequenced cDNA 
library. 

25 The significant point is that presence of the 

gene in an EST database is not a prerequisite for 
incorporation into a genome-derived microarray, and 
further, that arraying such "unknown" exons can help to 
assign function to as-yet undiscovered .genes . 

30 

Verification of Gene Expression 

To ascertain the validity of the approach 
described above to identify genes from raw genomic 
sequence, expression of two of the probes was assayed using 
35 reverse transcriptase polymerase chain reaction (RT PCR) 
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and northern blot analysis. 

Two microarray probes were selected on the basis 
of exon size, prior sequencing success, and tissue-specific 
gene expression patterns as measured by the microarray 
5 experiments. The primers originally used to amplify the 
two respective ORFs from genomic DNA were used in RT PCR 
against a panel of tissue-specific cDNAs (Rapid-Scan gene 
expression panel 24 human cDNAs) (OriGene Technologies, 
Inc., Rockville, MD) . 

10 Sequence AL079300_1 was shown by microarray 

hybridization to be present in cardiac tissue, and sequence 
AL031734_1 was shown by microarray experiment to be present 
in placental tissue (data not shown) . RT-PCR on these two 
sequences confirmed the tissue-specific gene expression as 

15 measured by microarrays, as ascertained by the presence of 
a correctly sized PCR product from the respective tissue 
type cDNAs. 

Clearly, all microarray results cannot, and 
indeed should not, be confirmed by independent assay 

20 methods, or the high throughput, highly parallel advantages 
of microarray hybridization assays will be lost. However, 
in addition to the two RT-PCR results presented above, the 
observation that 1/3 of the arrayed genes exist in 
expression databases provides powerful confirmation of the 

25 power of our methodology — which combines bioinf ormatic 
prediction with expression confirmation using genome- 
derived single exon microarrays — to identify novel genes 
from raw genomic data. 

To verify that the approach further provides 

30 correct characterization of the expression patterns of the 
identified genes, a detailed analysis was performed of the 
microarrayed sequences that showed high signal in brain. 

For this latter analysis, sequences that showed 
high (normalized) signal in brain, but which showed very 

35 low (normalized) signal (less than 0.5, determined to be 
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biological noise) in all other tissues, were further 
studied. There were 82 sequences that fit these criteria, 
approximately 2% of the arrayed elements. The 10 sequences 
showing the highest signal in brain in microarray 
5 hybridizations are detailed in Table 2, along with assigned 
function, if known or reasonably predicted. 



Table 2 



Function of the Most Highly 
Expressed Genes Expressed Only in Brain 

Microarray Normal Expressi Homology Gene Function 
Sequence ized on Ratio to EST . as described by 
Name Signal present GenBank 

in 

GenBank 


AP000217-1 


5.2 


+7.7 


High 


S-100 protein, 
b-chain, Ca 2+ 
binding protein 
expressed in 
central nervous 
system 


AP000047-1 


2.3 




High 


Unknown 
Function 


AC006548-9 


1.7 




High 


Similar to 
mouse membrane 
glyco-protein 
M6, expressed 
in central 
nervous system 
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AC007245-5 


1.5 




High 


Similar to 
amphiphysin, a 
synaptic 
vesicle- 
associated 
protein. Ref 21 


L44140-4 


1.2 


+2.0 


High 


Endothelial 
act in -binding 
protein found 
in nonmuscle 
filamin 


AC004689-9 


1.2 


'+3.5 


High 


Protein 
Phosphatase 
PP2A, neuronal/ 
downregulates 
activated 
protein kinases 


AL031657-1 


1.2 


+3.0 


High 


Unknown 
function/ 
Contains the 
anhyrin motif, 
a common 
protein 

sequence motif 


AC009266-2 


1.1 


+3.7 


Low 


Low homology to 
the 

Synaptotagmin I 
protein in 
rat/present at 
low levels 
throughout rat 
brain 


AP000086-1 


1.0 


+2.7 


Low 


Unknown, very 
poor homology 
to collagen 
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AC004689-3 


1.0 




High 


Protein 










Phosphatase 










PP9A ripnrnn^l / 










downregulates 










activated 










protein kinases 



Of the ten sequences studied by these latter 
confirmatory approaches, eight were previously known. Of 
these eight, six had previously been reported to be 

5 important in the central nervous system or brain. The exon 
giving the highest signal (AP00217-1) was found to be the 
gene encoding an S100B Ca 2+ binding protein, reported in 
the literature to be highly and uniquely expressed in the 
central nervous system. Heizmann, Neurochem. Res. 9:1097 

10 (1997). 

A number of the brain-specific probe sequences 
(including AC006548-9, AC009266-2) did not have homology to 
any known human cDNAs in GenBank but did show homology to 
. rat and mouse cDNAs. Sequences AC004689-9 and AC004689-3 

15 were both found to be phosphatases present in neurons 
(Millward et al. f Trends Biochem. Sci. 24 (5) : 186-191 
(1999)). Two microarray sequences, AP000047-1 and 
AP000086-1 have unknown function, with AP000086-1 being 
absent from GenBank. Functionality can now be narrowed 

20 down to a role in the central nervous system for both of 
these genes, showing the power of designing microarrays in 
this fashion. 

Next, the function of the chip sequences _with the 
highest (normalized) signal intensity in brain, regardless 

25 of expression in other tissues, was assessed. In this 
latter analysis, we found expression of many more common 
genes, since the sequences were not limited to those 
expressed only in brain. For example, looking at the 20 
highest signal intensity spots in brain, 4 were similar. to 
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tubulin (AC00807905; AF146191-2; AC007664-4; AF14191-2) , 2 
were similar to actin (AL035701-2; AL034402-1) , and 6 were 
found to be homologous to glyceraldehyde-3-phosphate 
dehydrogenase (GAPDH) (AL035604-1; Z86090-1; AC006064-L, 
5 AC006064-K; AC035604-3; AC006064-L) . These genes are often 
used as controls or housekeeping genes in microarray 
experiments of all types. 

Other interesting genes highly .expressed in brain 
were a ferritin heavy chain protein, which is reported in 

10 the literature to be found in brain and liver (Joshi et 
al. r J. Neurol. Sci. 134 (Suppl) : 52-56 (1995)), a result 
duplicated with the array. . Other highly expressed chip 
sequences included a translation elongation factor ID 
(AC007564-4) , a DEAD-box homolog (AL023804-4 ) , and a Y- 

15 chromosome RNA-binding motif {Chai et al., Genomics 

49(2):283-89 ( 1998 ) ) (AC007320-3 ) . A low homology analog 
(AP00123-1/2) to a gene, DSCRl, thought to be involved in 
trisomy 21 (Down's syndrome), showed high expression in 
both brain and heart, in agreement with the literature 

20 (Fuentes et al., Mol. Genet. 4 ( 10) : 1935-44 (1995)). 

As a further validation of the approach, we 
selected the BAC AC006064 to be included on the array. 
This BAC was known to contain the' GAPDH gene, and thus 
could be used as a control for the ORF selection process. 

25 The gene finding and exon selection algorithms resulted in 
choosing 25 exons from BAC AC006064 for spotting onto the 
array, of which four were drawn from the GAPDH gene. Table 
3 shows the comparison of the average expression ratio for 
the 4 exons from BAC006064 compared with the average 

30 expression ratio for 5 different dilutions of a 
commercially available GAPDH cDNA (Clontech) . 

Table 3 
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Comparison of Expression Ratio, for each 
tissue, of GAPDH 




rt^uuDUD4 ^n — h } 




Bone Marrow 


-1.81 ± 0.11 


-1.85 ± 0.08 


Brain 


-1.41 ± U . 11 


1 1 T J. A A C 

-1.1/ ± U.Ub 


DTI A "~l A 

BT4 / 4 


1 Q C JL A A Q 


1 4- A TO 

1 . bb x U. 1Z 


Fetal Liver 


- 1 . bz x U . U / 


1 /II 4- A a c; 

—±.41 x U.Uo 


hrt i nn 


i ^9 + n or 

1 . X u . U J 


9 fsd + 0 1 9 

6 . Dl X U . XZ 


Heart 


1.16 ± 0.09 


1.56 ± 0.10 


HeLa 


1.11 ±0.06 


1.30 ± 0.15 


Liver 


-1.62 ± 0.22 


-2.07 ± 


Lung 


-4.95 ± 0.93 


-3.75 ± 0.21 


Placenta 


-3.56 ± 0.25 


-3.52 ± 0.43 



Each tissue shows excellent agreement between the 
experimentally chosen exons and the control, again 
5 demonstrating the validity of the present exon mining 

approach. In addition, the data also show the variability 
of expression of GAPDH within tissues, calling into 
question its classification as a housekeeping gene and 
utility as a housekeeping control in microarray 
10 experiments. 

EXAMPLE 3 

Representation of Sequence and Expression Data as a 
"Mondrian" 

15 

For each genomic clone processed for microarray 
as above-described, a plethora of information was 
accumulated, including full clone sequence, probe sequence 
within the clone,, results of each of the three gene finding 
20 programs, EST information associated with the probe 
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sequences, and microarray signal and expression for 
multiple tissues, challenging our ability to display the 
information . 

Accordingly, we devised a new tool for visual 
5 display of the sequence with its attendant annotation 
which, in deference to its visual similarity to the 
paintings of Piet Mondrian, is hereinafter termed a 
"Mondrian" . FIGS . 3 and 4 present the key to the 
information presented on a Mondrian. 

10 FIG. 9 presents a Mondrian of BAC AC008172 (bases 

25,000 to 130,000 shown), containing the carbamyl phosphate 
synthetase gene (AF154830. 1) . Purple background within the 
region shown as field 81 in FIG. 3 indicates all 37 known 
exons for this gene. 

15 As can be seen, GRAIL II successfully identified 

27 of the known exons (73%), GENEFINDER successfully 
identified 37 of the known exons (100%), while DICTION 
identified 7 of the known exons (19%) . 

Seven of the predicted exons were selected for 

20 physical assay, of which 5 successfully amplified by PCR 
and were sequenced. These five exons were all found to be 
from the same gene, the carbamyl phosphate synthetase gene 
(AF154830.1) . 

The five exons were arrayed, and gene expression 

25 measured across 10 tissues. As is readily seen in the 
Mondrian, the five chip sequences on the array show 
identical expression patterns, elegantly demonstrating the 
reproducibility of the system. 

FIG. 10 is a Mondrian of BAC AL049839. We 

30 selected 12 exons from this BAC, of which 10 successfully 
sequenced, which were found to form between 5 and 6 genes. 
Interestingly, 4 of the genes on this BAC are protease 
inhibitors. Again, these data elegantly show that exons 
selected from the same gene show the same expression 

35 patterns, depicted below the red line. From this figure, 
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it is clear that our ability to find known genes is very 
good. A novel gene is also found from 8 6.6 kb to 88.6 kb, 
upon which all the exon finding programs agree. We are 
confident we have two exons from a single gene since they 

5 show the same expression patterns and the exons are 
proximal to each other. Backgrounds in the following 
colors indicate a known gene (top to bottom) : 
red = kallistatin protease inhibitor (P29622); 
purple = plasma serine protease inhibitor (P05154); 

10 turquoise = ccl anti-chymotrypsin (P01011); mauve = 40S 

ribosomal protein (P08865) . Note that chip sequence 8 and 
12 did not sequence verify. 



15 EXAMPLE 4 

Genome-Derived Single Exon Probes Useful For Measuring 
Human Gene Expression 

The protocols set forth in Examples 1 and 2, 

20 supra, were applied to additional human genomic sequence as 
it became newly available in GenBank to identify unique 
exons in the human genome that could be shown to be 
expressed at significant levels in brain tissue. 

These unique exons are within longer probe 

25 sequences . Each probe was completely sequenced on both 
strands prior to its use on a genome-derived single exon 
microarray; sequencing confirms the exact chemical 
structure of each probe. An added benefit of sequencing is 
that it placed us in possession of a set of single base- 

30 incremented fragments of the sequenced nucleic acid, 
starting from the sequencing primer 3' OH. (Since the 
single exon probes were first obtained by PCR amplification 
from genomic DNA, we were of course additionally in 
possession of an even larger set of single base incremented 

35 fragments of each of the 12,821 single exon probes, each 
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fragment corresponding to an extension product from one of 
the two amplification primers.) 

The structures of the 12,821 unique single exon 
probes are clearly presented in the Sequence Listing as SEQ 
5 ID Nos.: 1 - 12,821. The 16 nt 5' primer sequence and 16 
nt 3' primer sequence present on the amplicon are not 
included in the sequence listing. The sequences of the 
exons present within each of these probes is presented in 
the Sequence Listing as SEQ ID NOs.: 12,822 - 25,434, 

10 respectively. It will be noted that some amplicons have 
more than one exon, some exons are contained in more than 
one amplicon. 

As detailed in Example 2, expression was 
demonstrated by disposing the amplicons as single exon 

15 probes on nucleic acid microarrays and then performing two- 
color fluorescent hybridization analysis; significant 
expression is based on a statistical confidence that the 
signal is significantly greater than negative biological 
control spots. The negative biological control is formed 

20 from spotted DNA sequences from a different species. Here, 
32 sequences from E.Coli were spotted in duplicate to give 
a total of 64 spots. 

For each hybridisation (each slide, each colour) 
the median value of the signal from all of the spots is 

25 determined. The normalised signal value is the arithmetic 
mean of the signal from duplicate spots divided by the 
population median. 

Control spots are eliminated if there is more 
that a five-fold difference between each one of the 

30 duplicate spots. raw signals. 

The median of the signal from the remaining 
control spots is calculated and all subsequent' calculations 
are done with normalised signals. 

Control spots having a signal of greater than 

35 median +2.4 (the value 2.4 is roughly 12 times the 
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observed standard deviation of control spot populations) 
are eliminated. Spots with such high signals are considered 
to be "outliers". 

The mean and standard deviation of the modified 
5 control spot populations are calculated. 

The mean + 3x the standard deviation {mean + 
(3*SD)) is used as the signal threshold qualifier for that 
particular hybridisation. Thus, individual thresholds are 
determined for each. channel and each hybridisation. 

10 This means that, assuming that the data is 

distributed normally, there is a 99% confidence that any 
signal exceeding the threshold is significant. 

The probes and their expression data are 
presented in Table 4, set forth respectively in Example 5. 

15 Example 5 presents the subset of probes that is 

significantly expressed in the human heart and thus 
presents the subset of probes that was recognized to be 
useful for measuring expression of their cognate genes in 
human brain tissue. 

20 The sequence of each of the exon probes 

identified by SEQ ID NOS . : 12,822 - 25,434 was individually 
used as a BLAST (or, for SWISSPROT, BLASTX) query to 
identify the most similar sequence in each of dbEST, 
SwissProt (BLASTX) , and NR divisions of GenBank. Because 

25 the query sequences are themselves derived from genomic 
sequence in GenBank, only -nongenomic hits from NR were 
scored. 

The smallest in value of the BLAST (or BLASTX) 
expect ( "E" ) scores for each query sequence across the 

30 three database divisions was used as a measure of the 

"expression novelty" of the probe's ORF. Table 4 is sorted 
in descending order based on this measure, reported as 
"Most Similar (top) Hit BLAST E Value". Those sequences for 
which no "Hit E Value" is listed are those exons which were 

35 found to have no similar sequences. 
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As sorted, Table 4 thus lists its respective 
probes (by "AMPLICON SEQ ID NO.:" and additionally by the 
SEQ ID NO: . of the exon contained within the probe: "EXON 
SEQ ID NO.:") from least similar to sequences known to be 
5 expressed (i,e., highest BLAST E value), at the beginning 
of the table, to most similar to sequences known to be 
expressed (i.e., lowest BLAST E value), at the bottom of 
the table. 

Table 4 further provides, for each listed probe, 

10 the accession number of the database sequence that yielded 
the "Most Similar (top) Hit BLAST E Value", along with the 
name of the database in which the database sequence is 
found ("Top Hit Database Source") . 

Table 4 further provides SEQ ID NOS. 

15 corresponding to the predicted amino acid sequences where 
they have been determined for the probe and exon nucleotide 
sequences. These are set out as PEPTIDE SEQ ID NOS.:. The 
peptide sequences for a given exon are predicted as 
follows: Since each chip exon is a consensus sequence drawn 

20 from predictions from various exon finding programs (i.e. 
Grail, GeneFinder and GenScan) , the multiple initial ORFs 
are first determined in a uniform way according to each 
prediction. In particular, the reading frame for predicting 
the first amino acid in the peptide sequence always starts 

25 with the first base of any codon and ends with the last 

base of non-termination codon. Next, for each strand of the 
exon, initial ORFs are merged into one or more final ORFs 
in an exhaustive process based on the following criteria: 
1) the merging ORFs must be overlapping, and 2) the merging 

30 ORFs must be in the same frame. 

The Sequence Listing, which is a superset of all 
of the data presented in Table 4, further includes, for 
each probe, the most similar hit, with accession number and 
BLAST E value, from the each of the three queried 

35 databases. 
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Table 4 further lists, for each probe, a portion, 
of the descriptor for the top hit ("Top Hit Descriptor") as 
provided in the sequence database. For those ORFs that are 
similar in sequence, but nonidentical to known sequences 
5 (e.g., those with BLAST E values between about le-05 and 
le-100), the descriptor reveals the likely function of the 
protein encoded by the probe's ORF. 

Using BLAST E value cutoffs of le-05 (i.e., 1 x 
10" 5 ) and le-100 (i.e., 1 x 10" 100 ) as evidence of similarity 

10 to sequences known to be expressed is of course arbitrary: 
in Example 2, supra, a BLAST E value of le-30 was used as 
the boundary when only two classes were to be defined for 
analysis (unknown, >le-30; known <le-30) (see also FIG. 8) . 
Furthermore, even when the "Most Similar (Top) Hit BLAST E 

15 Value" is low, e.g., less than about le-100 — which is 

probative evidence that the query sequence has previously 
been shown to be expressed ^ the top hit is highly unlikely 
exactly to match the probe sequence. 

First, such expression entries typically will not 

20 have the intronic and/or intergenic sequence present within 
the single exon probes listed in the Table. Second, even 
the ORF itself is unlikely in such cases to be present 
identically in the databases, since most of the EST and 
mRNA clones in existing databases include multiple exons, 

25 without any indication of the location of exon boundaries. 

As noted, the data presented in Table 4 represent 
a proper subset of the data present within the attached 
sequence listing. For each amplicon probe (SEQ ID NOs . : 1 
- 12,821) and probe exon (SEQ ID NOs.: 12,822 - 25,434, 

30 respectively), the sequence listing further provides, 
through iterated annotation fields <220> and <223>: 

(a) the accession number of the BAC from which 
the sequence was derived ("MAP TO"), thus providing a link 
to the chromosomal map location and other information about 

35 the genomic milieu of the probe sequence; 
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(b) the most similar sequence provided by BLAST 
query of the EST database, with accession number and BLAST 
E value for the "hit"; 

(c) the most similar sequence provided by BLAST 

5 query of the GenBank NR database, with accession number and 
BLAST E value for the "hit"; and 

(d) the most similar sequence provided by BLASTX 
query of the SWISSPROT database, with accession number and 
BLAST E value for the "hit". 

10 

EXAMPLE 5 

Genome-Derived Single Exon Probes Useful For Measuring 
Expression of Genes in Human Brain 

15 

Table 4 (536 pages) presents expression, homology, and 
functional information for the genome-derived single exon 
probes that are expressed significantly in human brain. 
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1. A spatially-addressable set of single exon nucleic acid 
probes for measuring gene expression in a sample derived 
5 from human brain comprising a plurality single exon nucleic 
probes, said probes comprising any one of the nucleotide 
sequences set out in SEQ ID NOs : 1 - 12,821 or a 
complementary sequence*, or a portion of such a sequence. 

10 2. A spatially-addressable set of single exon nucleic acid 
probes as claimed in claim 1 wherein each of said plurality 
of probes is separately and addressably amplifiable. 

3. A spatially-addressable set of single exon nucleic acid 
15 probes as claimed in claim 1 wherein each of said plurality 

of probes is separately and addressably isolatable from 
said plurality. 

4. A spatially-addressable set of single exon nucleic acid 
20 probes as claimed in any of claims 1 to 3 wherein said 

probes comprise any one of the nucleotide sequences set out 
in SEQ ID NOS.: 12,822 - 25,434. 

5. A spatially-addressable set of single exon nucleic acid 
25 probes as claimed in any of claims 1 to 4, wherein each of 

said plurality of probes is amplifiable using at least one 
common primer. 

6. A spatially-addressable set of single exon nucleic acid 
30 probes as claimed in any of claims 1 to 5 wherein the set 

comprises between 50 - 20,000 single exon nucleic acid 
probes : 

7. A spatially-addressable set of single exon nucleic acid 
35 probes as claimed in any of claims 1 to 6, wherein the 

99 



WO 01/57275 PCT/US01/00667 

average length of the single exon nucleic acid probes is 
between 200 and 500 bp. 



8. A spatially-addressable set of single exon nucleic acid 

5 probes as claimed in any of claims 1 to 7, wherein at least 
50% of said single exon nucleic acid probes lack 
prokaryotic and bacteriophage vector sequence. 

9. A spatially-addressable set of single exon nucleic acid 
10 probes as claimed in any of claims 1 to 8, wherein at least 

50% of said single exon nucleic acid probes lack 
homopolymeric stretches of A or T. 

10. A spatially-addressable set of single exon nucleic acid 
15 probes as claimed in any of claims 1-9 characterised in 

that said set of probes is addressably disposed upon a 
substrate . 

11. A spatially-addressable set of single exon nucleic acid 
20 probes as claimed in claim 10 wherein said substrate is 

selected from glass, amorphous silicon, crystalline silicon 
and plastic. 

12. A microarray comprising a spatially addressable set of 
25 single exon nucleic acid probes as claimed in any of claims 

1 - 11. 

13. A single exon nucleic acid probe for measuring human 
gene expression in a sample derived from human brain 

30 comprising a nucleotide sequence as set out in any of SEQ 
ID NOs.: 1 - 12,821 or a complementary sequence or a 
fragment thereof wherein said probe hybridizes at high 
stringency to a nucleic acid molecule expressed in the 
human brain. 
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14. A single exon nucleic acid probe as claimed in claim 13 
comprising a nucleotide sequence as set out in any of SEQ 
ID NOs.: 12,822 - 25,434 or a complementary sequence or a 
fragment thereof. 

5 

15. A single exon nucleic acid probe for measuring human 
gene expression in a sample derived from human brain which 
is a nucleic acid molecule having a sequence encoding a 
peptide comprising a peptide sequence as set out in any of 

10 SEQ ID NOs.: 25,435 - 37,811, or a complementary sequence 
or a fragment thereof wherein said probe hybridizes at high 
stringency to a nucleic acid expressed in the human brain. 

16. A single exon nucleic acid probe as claimed in any one. 
15 of claims 13 to 15 wherein said single exon nucleic acid 

probe comprises between 15 and 25 contiguous nucleotides of 
said SEQ ID NO. 

17. A single exon nucleic acid probe as claimed in any one 
20 of claims 13 to 15, wherein said probe is between 3 - 25 kb 

in length. 

18. A single exon nucleic acid probe as claimed in any one 
of claims 13 - 17, wherein said probe is DNA, RNA or PNA. 

25 

19. A single exon nucleic acid probe as claimed in any one 
of claims 13 - 18, wherein said probe is detectably 
labeled. 

30 20. A single exon nucleic acid probe as claimed in any one 
of claims 13 - 19, wherein said probe lacks prokaryotic and 
bacteriophage vector sequence. 

21. A single exon nucleic acid probe as claimed in any one 
35 of claims 13 - 20, wherein said probe lacks homopolymeric 
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stretches of A or T. 

22. A method of measuring gene expression in a sample 
derived from human brain, comprising: 

5 contacting the microarray of claim 12, with a first 

collection of detectably labeled nucleic acids, 
said first collection of nucleic acids derived 
from mRNA of human brain; and then 
measuring the label detectably bound to each probe of 
10 said microarray. 

23. A method of identifying exons in a eukaryotic genome, 
comprising : 

algorithmically predicting at least one exon from 
15 genomic sequence of said eukaryote; and then 

detecting specific hybridization of detectably labeled 
nucleic acids to a single exon probe, 
wherein said detectably labeled nucleic acids are derived 
from mRNA from the brain of said eukaryote, said probe is a 
20 single exon probe having a fragment identical in sequence 
to, or complementary in sequence to, said predicted exon, 
said probe is included within a microarray according to 
claim 12, and said fragment is selectively hybridizable at 
high stringency. 

25 

24. A method of assigning exons to a single gene, 
comprising: 

identifying a plurality of exons from genomic 
sequence according to the method of claim 23; and 

30 then 

measuring the expression of each of said exons in a 
plurality' of tissues and/or cell types using 
hybridization to single exon microarrays having a 
probe with said exon, 

35 wherein a common pattern of expression of said exons in 
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said plurality of tissues and/or cell types indicates that 
the exons should be assigned to a single gene. 



25. A nucleic acid sequence as set out in any of SEQ ID 
5 NOs: 1 - 25,434 which encodes a peptide. 

26. A peptide encoded by a sequence as set out in any of 
SEQ ID Nos: 1 - 25,434. 

10 27. A peptide comprising a sequence as set out in any of 
SEQ ID NOs: 25,435 - 37,811. 
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Top Hit Descriptor 


Lycopersicon escu lentum M ill . GTPase (SAR2) mRN A, complete cds j 


Lycopersicon esculentum Mill. GTPase (SAR2) mRNA. complete cds ] 


RC0-HT061 3-200300-031-807 HT061 3 Homo sapiens cDNA ] 


ZINC-FINGER PROTEIN 1 (ZINC-FINGER HOMEODOMAIN PROTEIN 1 ) j 


ZINC-RNGER PROTEIN 1 (ZINC- FINGER HOMEODOMAIN PROTEIN 1 ) | 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 91 j 


HYPOTHETICAL 17.3 KDA PROTEIN IN MRDA-PHPB INTERGENIC REGION I 


< 

UJ 

to 
i 

LU 
Z 

z 

o 

< 


W D-40 REPEAT PROTEIN MSI3 | 


60S R1BOSOMAL PROTEIN L4 (L2) ] 


DNA MISMATCH REPAIR PROTEIN MUTS | 


SKT5 PROTEIN j 


za07o1 1.r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:291860 5' j 


'za07d 1 .r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAQE:291860 5' I 


OUTER CAPSID PROTEIN VP4 (HEMAGGLUTININ) (OUTER LAYER PROTEIN VP4) [CONTAINS: 
OUTER CAPSID PROTEINS VP 5 AND VP8] 


HYPOTHETICAL 157.0 KDA PROTEIN C38C10.5 IN CHROMOSOME III j 


ICATECHOL-O-METHYLTRANSFERASE. SOLUBLE FORM (S-COMT) ] 


6021 52573F1 NIH_MGC_81 Homo sapiens cDNA clone IMAGE:4293427 5' | 


URIDYLATE KINASE (UK) (URIDINE MONOPHOSPHATE KINASE) (UMP KINASE) [ 


URIDYLATE KINASE (UK) (URIDINE MONOPHOSPHATE KINASE) (UMP KINASE) ] 


PROBABLE CATION-TRANSPORTING ATPASE C6C3.05C j 


ENV POLYPROTEIN [CONTAINS: COAT PROTEIN GP52; COAT PROTEIN GP36] | 
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CO 

§ 
CD 

cn 

CO 
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a. 
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8 
a 1 
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2, 
X 

z 

LL 
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S 
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: Schlzophyllum commune unknown mRNA j 


Mus musculus mannosidase 2, alpha B1 (Man2b1 ), mRNA J 


|6O146S031F1 NIH_MGC_67 Homo sapiens cDNA clone 1MAGE:3871303 5' j 


IPyroccccus hortkoshli OT3 genomic DNA, 1 166001-1485000 nt position (6/7) j 


Deinococcus radiodurans R1 section 1 of 2 of the complete chromosome 2 | 


iDeinococcus radiodurans R1 section 1 of 2 of the complete chromosome 2 j 


Mus musculus mixed lineage kinase 3 (Mlk3) and two pore domain K+ channel subunit (Kcnk6) genes, 
complete cds 


,Homo sapiens DESC1 protein (DESC1), mRNA j 


| Mus musculus immunoglobulin scavenger receptor IgSR mRNA, complete cds j 


Top Hit 
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Source 
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[SWISSPROT | 


SWISSPROT | 


z 


ISWISSPROT I 


ISWISSPROT J 


ISWISSPROT | 


ISWISSPROT | 


ISWISSPROT | 


ISWISSPROT | 


|EST_HUMAN | 
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SWISSPROT 


ISWISSPROT | 


ISWISSPROT | 
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|P03374 
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iBE780163.1 


|AP000006.1 


|AE001862.1 


|AE001862.1 


AF155142.1 


CD 


|AF302046.1 
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(Top) Hit 
BLAST E 
Value 
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ORFSEQ 
ID NO: 


| 28390] 


| 28391| 


j 32713| 


32800| 


32801 | 




! 37350 1 


CO 


37165) 


34011| 


36107| 


36125) 


33623| 


33624 1 




CO 




32216) 


35B27| 


35828| 




; 34931| 


36067| 


I 35488| 


36337| 


| 32717| 


i 35565| 


1 36274| 
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[ 3281 6 1 
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| 19667| 


| 19740) 


| 19740| 


| 22151| 


| 24047| 


1 22532| 


23878| 


| 20B75| 


i 22897) 
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20503] 


i 20503| 
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| 18010| 


CO 

CM 
CO 
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| 22622| 


23743 1 


| 21768| 


| 22851| 


| 22294] 


| 23106| 


| 10871) 
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| 16270| 


| 19752 1 
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Top Hit Descriptor 


GENE 68 PROTEIN j 


Pan troglodytes novel repetitive sdo LTR element in the RNU2 locus j 


50S RIBOSOMAL PROTEIN L4 1 


602247838F1 NIH_MGC_62 Homo sapiens cDNA clone IMAGE:4333209 5' j 


CYCLIN-DEPENDENT KINASE INHIBITOR 1B (CYCLIN-DEPENDENT KINASE INHIBITOR P27) 
(P27KIP1) 


HYPOTHETICAL PROTEIN HVLF1 | 


601507510F1 NIH_MGC_71 Homo sapiens cDNA clone IMAGE:3909051 5' I 


GLC7-INTERACT1NG PROTEIN 1 ! 


ISOMALTASE] j 


ISOMALTASE] 


ISOMALTASE ] j 


ISOMALTASE J 


CELL DIVISION PROTEIN FTSY HOMOLOG | 


HYPOTHETICAL PROTEIN KIAA0144 j 


NITRIC-OXIDE SYNTHASE (NOS, TYPE 1) (NEURONAL NOS) (N-NOS) (NNOS) 1 


Ureaplasma urealyticum aection 33 of 59 of the complete genome j 


CYTOCHROME C OXIDASE POLYPEPTIDE III • \ 


GENOME POLYPROTEIN [CONTAINS: CAPSID PROTEIN C (CORE PROTEIN); MATRIX PROTEIN 
(ENVELOPE GLYCOPROTEIN M); MAJOR ENVELOPE PROTEIN E; NONSTRUCTURAL PROTEINS 
NS1, NS2A, NS2B, NS4A AND NS4B; HELICASE (NS3); RNA-DIRECTED RNA POLYMERASE (NS5)] 


GENOME POLYPROTEIN [CONTAINS: CAPSID PROTEIN C (CORE PROTEIN); MATRIX PROTEIN 
(ENVELOPE GLYCOPROTEIN M); MAJOR ENVELOPE PROTEIN E; NONSTRUCTURAL PROTEINS 
NS1, NS2A, NS2B, NS4A AND NS4B; HELICASE (NS3); RNA-DIRECTED RNA POLYMERASE (NS5)] 


N.tabacum chilinase gene 50 for class I chilinase C ] 


Mus musculus seminal vesicle secretory protein 99 (MSVSP99) gene, promoter region I 


MRO-BN0070-300500-028-h05 BN0070 Homo sapiens cDNA 


Human hereditary haemcchromatosis region, hlstone 2A-like protein gene, hereditary haemochromatosis 
(HLA-H) gene, RoRet gene, and sodium phosphate transporter (NPT3) gene, complete cds 


[HYPOTHETICAL TRANSCRIPTIONAL REGULATOR IN AIDB-RPSF INTERGENIC REGION | 


SUCRASE ; 


SUCRASE ; 


SUCRASE ; 


SUCRASE ; 


SUCRASE-ISOMALTASE, INTESTINAL [CONTAINS; 


SUCRASE-ISOMALTASE, INTESTINAL [CONTAINS 


SUCRASE-ISOMALTASE, INTESTINAL [CONTAINS: 


SUCRASE-ISOMALTASE, INTESTINAL [CONTAINS: 


Top Hit 
Database 
Source 


SWISSPROT 1 


NT _ I 


SWISSPROT | 


EST_HUMAN | 


SWISSPROT 


SWISSPROT | 


EST_HUMAN | 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


SWISSPROT | 


NT I 


SWISSPROT | 


SWISSPROT 


SWISSPROT 


NT 


NT 


EST HUMAN 


NT 


SWISSPROT 


c 

Is 

X 


P28984 \ 


U57503.1 i 


P11253 { 


CO 
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to 


P46414 


to 

ii 
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BE885880.1 | 


CO 

0_ 


062653 i 


062653 1 


062653 j 
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033010 ! 


Q14157 j 


061309 i 


AE002132.1 i 


P14548 


P07564 
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CO 
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Is 
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LL U 

< c 


BE81 4357.1 


U91328.1 
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Value 
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Signal 


4.32| 


2.53 1 


o 




0.48 


3.06] 


11.691 


0.95 
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0.751 


0.75I 


1.441 


0.45I 


0.44| 


.0.631 


1,531 
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5 

■«r 
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0.55 
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ORF SEQ 
ID NO: 
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346111 
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22853 


23483 1 
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Top Hit Descriptor 

TO^AnnR r1 .(inorM rclirvs Wnmn untune rTlNIA r.lnnA IMArtF-Sfil SOfi «?' 


DKFZp547P243_s1 547 (synonym: hfbri) Homo sapiens cDNA clone DKFZp547P243 3' 


Maize mitochondrial tRN A-Ser gene and tRNA-Phe pseudogene 


tg94d09.x1 NCLCGAP_CLL1 Homo sapiens cDNA clone IMAGE:21 16433 3' 


Human mRNA for KIAA0146 gene, partial cds 


Thermoplasma acidcphilum complete genome; segment 3/5 


Homo sapiens DKFZP586M01 22 protein (DKFZP586M0122), mRNA 


Homo sapiens DKFZP586M0122 protein (DKFZP586M0122). mRNA 


Ovis aries prion protein gene, complete cds 


Human papillomavirus type 7 genomic DNA 


Fugu rubripes neurofibromatosis type 1 (NF1), A-kinase anchor protein (AKAP84), BAW proteii 
WSB1 protein (WSB1) genes, complete cds 


Fugu rubripes neurofibromatosis type 1 (NF1), A-kinase anchor protein (AKAP84), BAW proteii 
WSB1 protein (WSB1) genes, complete cds 


602156687F1 NIH_MGC_83 Homo sapiens cDNA clone 1MAGE:4297556 5' 


wt45g07.xl NCI_CGAP_Pan1 Homo sapiens cDNA clone IMAGE:2510460 3' 


Homo sapiens mRNA for KIAA1157 protein, partial cds 


DNA TOPOISOMERASE III ALPHA 


Homo sapiens mRNA for KIAA0905 protein, complete cds 


SYNAPSIN II 


SYNAPSIN II 


Homo sapiens caveoi!n-1/-2 locus, Contlgl , D7S522, genes CAV2 (exons 1 , 2a, and 2b), CAV' 
2) 


he23f05jd NCI_CGAP_CML1 Homo sapiens cDNA clone IMAGE:2919873 3' similar to cental 
repetitive element; 


LAMININ BETA-2 CHAIN PRECURSOR (S-LAMININ) 


LAMININ BETA-2 CHAIN PRECURSOR (S-LAMININ) 


GLUCOAMYLASE PRECURSOR (GLUCAN 1,4-ALPHA-GLUCOSiDASE) (1 ,4-ALPHA-D-Gl 
GLUCOHYDROLASE) 


Homo sapiens Xq pseudoautosomai region; segment 1/2 
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RC1-BT031 3-301 299-0 12-f05 BT0313 Homo sapiens cDNA 


Sceloporus undulatus ornithine transcarbamylase (OTC) mRNA, complete cds 
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Database 
Source 
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X74463.1 | 


AF064564.2 


AF064564.2 


BF681 547.1 | 
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AB032983.1 I 


Q13472 | 
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AJ133269.1 
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ORFSEQ 
ID NO: 


37282] 




37549I 


30713| 
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30754| 




31936| 
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Exon 
SEQ ID 
NO: 
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25325 


2481 5 | 
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21785| 


21819[ 


Probe 
SEQ ID 
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Top Hit Descriptor 


Human Coronavirus gene for membrane protein | 


Human Coronavirus gene for membrane protein J 


Homo sapiens MHC binding factor, beta {MHCBFB} mRNA j 


Homo sapiens MHC binding factor, beta (MHCBFB) mRNA j 
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AV758825 BM Homo sapiens cDNA clone BMFAWC04 5' | 


2h94a02.r1 Soares_fetaJJIver_spleen J NFLS_S1 Homo sapiens cDNA clone IMAGE:428906 5' j 


zh94a02.M Scares JetalJvrer^spleenJNFLSjSI Homo sapiens cDNA clone IMAGE:428906 5' | 


Human retinoblastoma susceptibility gene exone 1 -27, oomplete cds j 


PBRl=proline-rich protein {intron 3) [human, Genomic, 898 nt] j 


a63b11.s1 Soares_fetalJiver_spleen_1NFLS_S1 Homo sapiens cDNA clone IMAGE:435453 3' similar to 
contains Alu repetitive element; contains element MER38 repetitive element ; 


Picea glauca EMB13 mRNA j 


Hordeum vulgare gene encoding cysteine proteinase j 


NADH-U B IQ U IN ON E OXIDOREDUCTASE CHAIN 8 (NADH DEHYDROGENASE 1, CHAIN 8) (NDH-1, 
CHAIN 8) 


Human adenovirus type 5, complete genome j 


THROMBOMODULIN PRECURSOR (FETOMODULIN) (TM) [ 


EST388293 MAGE resequences, MAGN Homo sapiens cDNA j 


Homo sapiens chromosome 21 segment HS21 C1 02 j 


Apple mosaic virus RNA 2 putative polymerase gene, complete cds j 


SERINE/THREONINE PROTEIN KINASE MINIBRAIN | 


PROBABLE OXIDOREDUCTASE ZK1290.5 IN CHROMOSOME II | 


Lycopersicon esculentum putative Mil copy 1 nematode-resistance gene j 


B2 BRADYKININ RECEPTOR (BK-2 RECEPTOR) | 


Danio rerlo mRNA for Eph-like receptor tyrosine kinase rtk8 | 
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B.aphidicola 1 6S rDNA (host T.suberi) | 


AMINO-ACID ACETYLTRANSFERASE (N-ACETYLGLUTAMATE SYNTHASE) (AGS) (NAGS) | 


Calllthrlxjacchus UBE1 gene derived retroposon on the Y chromosome ) 


Xenopus laevis rac GTPase mRNA, complete cds \ 


PROBABLE ENDONUCLEASE IV (ENDODEOXYRIBONUCLEASE IV) | 


Enterobacteriaceae sp. JM983 partial groES gene for GroES-llke protein and partial groEL gene for GroEL- 
like protein, isolate JM983 
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Database 
Source 
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NT : . | 


NT | 


NT | 


SWISSPROT | 


NT | 
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P22587 i 
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0.49| 


0.71 1 


0.71 1 


0.81 1 
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16.16| 




4.57| 


1.49 


1.59| 
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2.32| 
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35876| 
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22404| 


22404 1 
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18570) 
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Top Hil Descriptor 


Enterobacteriaceae sp. JM983 partial groES gene for GroES-like protein and partial groEL gene for GroEL- 
llke protein, Isolate J M983 


|601456337F1 NIH_MGC_68 Homo sapiens cDNA clone 1MAGE:3860049 5' j 


|601 456337F1 NIH_MGC_66 Homo sapiens cDNA done 1MAGE:3860049 5' j 


jPHOSPHOGLUCOMUTASE (GLUCOSE PHOSPHOMUTASE) (PGM) | 
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]601 1 102S8F1 NIH_MGCJ6 Homo sapiens cDNA clone IMAGE:3350750 5' | 


|601110258F1 NIH„MGC_1 6 Homo sapiens cDNA clone IMAGE:3350750 5' J 


|tx42d 0 x\ NCI_CGAPJ.u24 Homo sapiens cDNA done IMAGE:2272242 3* j 


Homo sapiens X28 region near ALD locus containing dual specificity phosphatase 9 (DUSP9), ribosomal 
protein Ll8a (RPLl8a), Ca2+/Calmodulin-dependent protein kinase I (CAMKI), creatine transporter (CRTR), 
CDM protein (COM), adrenoleukodystrophy protein > 


Drosophila melanogaster sodium channel protein (para) gene, exons 9, 1 0, 1 1 ,1 2 and optional segments b, c, d 
and e, partial cds 


|Trlticum asstivum stripe rust resistance protein Yr10 (Yr1 0) gene, complete cds [ 


jSalmonelia typhimurium adenine-methyitransferase (mod) and restriction endonuclease (res) | 
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|PM2-UM0053-240300-005-f12 UM0053 Homo sapiens cDNA | 


Parvovirus 319 DNA, patient C, genome position 2448-2994 j 


|Parvovirus B19 DNA, patient C, genome position 2448-2994 j 


|Arabidopsis thaliana DNA chromosome 4, ESSA 1 FCA contig fragment No. 6 j 


|P.falclparum complete gene map of plastid-like DNA (IR-A) j 


[Rattus norvegicus (strain R21 ) Rps2r gene, complete cds ( 


|AV752605 NPD Homo sapiens cDNA clone NPDBAG06 5' j 


|AV752605 NPD Homo sapiens cDNA clone NPDBAG06 5" j 


| Homo sapiens centrosomal protein 2 (CEP2), mRNA j 


Sphyrna tiburo NADH dehydrogenase subunit 2 (NADH2) gene, mitochondrial gene encoding mitochondrial 
protein, partial cds 


|Homo sapiens CGI-125 protein (LOC51003), mRNA j 
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|601675639F1 NIH_MGC„21.Homo sapiens cDNA clone IMAGE:3955473 5' f 


I601675639F1 NIH_MGC_21 Homo sapiens cDNA clone IMAGE:3958473 5' I 


|RC1-CT0295-241199-011-b02 CT0295 Homo sapiens cDNA { 
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|L81138.1 I 
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Top Hit Descriptor 


601820312F1 NIH_MGC58 Homo sapiens cDNA clone IMAGE:4052018 5' | 


ye52f01 .s1 Soares fetal liver spleen 1 NFLS Homo sapiens cDNA clone IMAGE:121 369 3' similar to contains 
Alu repetitive element; 


Homo sapiens hypothetical protein FU2004B (FU20048), mRNA f 


AB200G8R Infant brain, LLNL array of Dr. M. Soares 1NIB Homo sapiens cDNA clone LLAB200G8 5" 


AB200G8R Infant brain, LLNL array of Dr. M. Soares 1NIB Homo sapiens cDNA clone LLAB200G8 5' 


Human pre-B ceil stimulating factor homologue (SDF1b) mRNA, complete cds | 


INTER-ALPHA-TRYPSIN INHIBITOR HEAVY CHAIN H3 PRECURSOR (ITI HEAVY CHAIN H3) ] 
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Rattus norvegicus Rab3 GDP/GTP exchange protein mRNA, complete cds | 


P8Q-COILIN | 


Homo sapiens uncoupling protein-3 (UCP3) gene, complete cds j 


Homo sapiens neurexin I ll-alpha gene, partial cds j 


Danio rerio UM class homeodomain protein (lim5) mRNA, complete cds j 


Xenopus laevis gene for aldolase, complete cds j 


Danio rerio semaphorin Z1 a mRNA, complete cds | 


Fugu rubripes neural cell adhesion molecule L1 homdog (L1-CAM) gene, complete cds; putative protein 1 
(PUT1) gene, partial cds; mitosis-specific chromosome segregation protein SMC1 homdog (SMC1) gene, 
complete cds; and calcium channel alpha-1 subunit* 


Rabbit MHC fragment RLA-DF DNA | 


Oithona nana cytochrome-c oxidase subunit 1 (coxl) gene, partial cds; mitochondrial gene for mitochondrial 
product 


Xylella fastidiosa, section 90 of 229 of the complete genome | 


Chlamydophila pneumoniae AR39, section 21 of 94 of the complete genome j 
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PUTATIVE F420-DEPENDENT NADP REDUCTASE j 


Peeudorabiee virus Ea glycoprotein M gene, complete cds ] 


Homo sapiens cell death-inducing DFFA-like effector B (CIDEB), mRNA j 


M.aeruglnosa (HUB 5-2-4) DNA from plasmtd PMA1 | 


Synechocystis sp. PCC6803 complete genome, 13/27, 1576593-1719643 | 


Homo sapiens SOS1 (SOS1 ) gene, partial cds j 


Homo sapiens AT-binding transcription factor 1 (ATBF1), mRNA j 
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20004| 
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13238 1 
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Top Hit Descriptor 


Homo sapiens protein tyrosine phosphatase, receptor-type, zeta polypeptide 1 (PTPR21 ) mRNA J 


'Homo sapiens secreted C-type lectin precursor (LSLCL) gene, complete cds | 


Mycoplasma genltalium section 9 of 51 of the complete genome | 


izu42h12.y5 Soares ovary tumor NbHOT Homo sapiens cDNA clone IMAGE:74071 1 5' 


|zu42h12.y5 Soares ovary tumor NbHOT Homo sapiens cDNA clone IMAGE:74071 1 S \ 
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7e73c12.x1 NCLCGAP_Pr28 Homo sapiens cDNA clone IMAGE:32881 18 3* similar to gb:J02783 
PROTEIN DISULFIDE ISOMERASE PRECURSOR (HUMAN); 


7e73c12.x1 NCLCGAP_Pr28 Homo sapiens cDNA clone IMAGE:32881 18 3' similar to gb:J02783 
PROTEIN DISULFIDE ISOMERASE PRECURSOR (HUMAN); 


Roridula gorgonias ribulose 1 ,5-bisphosphate carboxylase (rbcL) gene, partial cds; chioroplast gene for 
chloroplast product 


7q71c12.x1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE: 3" similar to contains element MER29 
repetitive element ; 


7q71c12.x1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE: 3' similar to contains element MER29 
repetitive element; 


wx94b02o1 NCl_CGAP_Mel15 Homo sapiens cDNA clone IMAGE:2551275 3' similar to 
SW:COXA_HUMAN P20674 CYTOCHROME C OXIDASE POLYPEPTIDE VA PRECURSOR ; 


|601339867F1 NIH_MGC_53 Homo sapiens cDNA clone IMAGE:3682168 5' | 


BASEMENT MEMBRANE-SPECIFIC HEPARAN SULFATE PROTEOGLYCAN CORE PROTEIN 
PRECURSOR (HSPG) (PERLECAN) (PLC) 


og30e05.s1 NCI_CGAP_Br7 Homo sapiens cDNA clone IMAGE:1441376 3' similar to gb:J0261 1 
APOLIPOPROTEiN D PRECURSOR (HUMAN); 
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NUCLEAR FACTOR OF ACTIVATED T CELLS 5 (T CELL TRANSCRIPTION FACTOR NFAT5) (NF-AT5) 
(REL DOMAIN-CONTAINING TRANSCRIPTION FACTOR NFAT5) 


I Homo sapiens phospholipid scramblase 1 gene, complete cds j 


I Homo sapiens mRNA for KIAA0740 protein, partial cds j 


| Chlamydophila abortus strain S26/3 POMP91 A and POMP90A precursor, genes, complete cds | 


|Azotobacter vindandii icd gene for isocitrate dehydrogenase, complete cds 


| Botrytis cinerea strain T4 cDNA library under conditions of nitrogen deprivation ) 


|am77g05.s1 Stratagene schizo brain S1 1 Homo sapiens cDNA clone (MAGE:1616504 3' | 
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|AL1 16780.1 | 


|AA984165.1 | 
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Top Hit Descriptor 


Medicago sativa chloroplast malate dehydrogenase precursor (p1 mdh) mRNA, nuclear gene encoding 
chloroplast protein, complete cds 


Mus musculus acetylcholine receptor beta (Acrb), mRNA | 


Mus musculus vanilloid receptor-like protein 1 (Vrh), mRNA j 


Chicken duplicated genes for histone H2A, H4 and a histone H3 gene j 


Chicken duplicated genes for histone H2A, H4 and a histone H3 gene j 


zq05b09.r1 Stratagene muscle 937209 Homo sapiens cDNA clone IMAGE:628793 5' j 


Homo sapiens PELOTA (PELOTA) gene, complete cds | 


RETINOIC ACID RECEPTOR GAMMA (RAR-GAMMA) (RETINOIC ACID RECEPTOR DELTA) (RAR- 
DELTA) 


Polyangium vitellinum (strain PI vt1) 16S rRNA gene j 


Pdyangium vitellinum (strain PI vt1) 16S rRNA gene j 


R.norveglcus mRNA for mammalian fusca protein 


602139319F1 NIH_MGC_46 Homo sapiens cDNA clone IMAGE:42981 17 5' j 
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TRANSCRJPTION-REPAIR COUPLING FACTOR (TRCF) j 


Human alpha 1 a adrenergic receptor (elphal a) gene, 5' flanking region j 


601063606F1 NIH_MGC_10 Homo sapiens cDNA clone IMAGE:3450000 5* j 


AV712326 DCA Homo sapiens cDNA clone DCAAUF07 5' j 


yi94a09.s1 Scares placenta Nb2HP Homo sapiens cDNA clone IMAGE:1 46872 3' | 


QV4-ST0023-160400-172-a01 STO023 Homo sapiens cDNA ] 


QV4-ST0023-160400-172-a01 ST0023 Homo sapiens cDNA j 


Human regenerating protein (reg) gene, complete cds | 


65B1 Human retina cDNA Tsp509l-cleaved sublibrary Homo sapiens cDNA not directional j 


601 556863F1 NlH_MGC_58 Homo sapiens cDNA clone IMAGE:3826767 5" j 


nac51f10.x1 NCI_CGAP_Brn23 Homo sapiens cDNA clone IMAGE:3406218 3' similar to contains element 
TAR1 repetitive element ; 


Homo sapiens postmeiotic segregation Increased 2-llke 9 (PMS2L9), mRNA J 


Homo sapiens postmeiotic segregation increased 2-like 9 (PMS2L9), mRNA j 
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Top Hit Descriptor 


hc90c02.x1 Soares_NFL_TJ3BC_S1 Homo sapiens cDNA clone IMAGE:2907266 3' similar to TR:095714 
095714 HERC2. ; 


Mus musculus unci 3 homolog (C. eiegans) 1 (Unci 3h1 ), mRNA | 


Mue muscutus adenylyl cyclase 1 (Adcyl ) cDNA, partial cds { 


H.sapiens DNA for BCL7A gene and BCL7A/IGH locus fussion | 


Homo sapiens neurotrophin-1/B-ceil stimulating factor-3 gene, complete cds j 


nq22e11.s1 NCI_CGAPj:o10 Homo sapiens cDNA clone IMAGE:1 144652 3* 

Homo sapiens potassium channel, subfamily K, member 5 (TASK-2) (KCNK5) mRNA, and translated 


products 

Saccharomyces cerevlsiae) cporulation protein (SP011) gene required for meiotic recombination, complete 
cds 


Mus musculus slow skeletal muscle troponin T (Tnntl ) gene, complete cds j 


nu85f09.s1 NCI_CGAP_AIv1 Homo sapiens cDNA clone IMAGE:1217513 I 


Homo sapiens reproduction 8 (D8S2298E) mRNA j 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 4 | 


Arabidopsis thaliana DNA chromosome 4, contlg fragment No. 4 

yj77f10.y5 Soares breast 2NbHBst Homo sapiens cDNA clone IMAGE:1 54795 5' similar to contains dement 


MER6 repetitive element ; 

PM1-HT0350-201299-004-D04 HT0350 Homo sapiens cDNA 


S.cerevisiae ORFs from chromosome X I 
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601 883880 F1 N IH_MGC_57 Homo sapiens cDNA clone IMAGE:4098387 5' | 
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hbc81 1 Human pancreatic Islet Homo sapiens cDNA clone hbc81 1 5'end j 


hbc81 1 Human pancreafcc islet Homo sapiens cDNA clone hbc81 1 5'end j 


Rattus norveglcus Spermine binding protein (Sbp), mRNA j 


Influenza A virus isolate hk51697 hemagglutinin (HA) gene, partial cds | 


Human collagen alpha2(XI) (COL1 1 A2) gene, exons 6 through 16, and partiai cds | 
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RC6-NT0029-240400-01 1 -E08 NT0029 Homo sapiens cDNA J 


601611333F1 NIH_MGC_71 Homo sapiens cDNA clone IMAGE:3912488 5' | 


hd1 1c08.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2909198 3' | 
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34816| 




36957) 


37206| 


37342| 






Exon 
SEQ ID 
NO: 


21775 


25431 | 


22864 | 


23065j 


24486 | 


25382 


17037 
18219 


19342| 


19352| 


199011 


20325] 


20325 


20500 
21875 


23322! 


I £ 

; s 

H CM 


19190| 


19423 | 


20460 | 


8 


21664j 


23436| 


1 

CN 


23914| 


24039| 


24609! 


24689| 


Probe 
SEQ ID 
NO: 


s 

o 

CD 


| 9196| 


I 10216| 


| 104191 


| 11925| 


12709 




6579 1 


6589 1 


mis 


7661 


1 


i 7805 
! 9144 


I 106291 


12795| 


! 6422| 


I 


7764 




j 8974| 


[ 10751 1 


| 11022| 


I 11252| 


| 11349| 


| 12116| 


| 12237| 
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CO 
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WO 01/57275 
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Top Hit Descriptor 


| Escherichia coli K-12 MG1655 section 108 of 400 of the complete genome | 


| Bombyx mori nuclear pdyhedrosis virus, complete genome j 


[EST02531 Fetal brain, Stratagene <ca«936206) Homo sapiens cDNA clone HFBCY17 | 


| EST02531 Fetal brain, Stratagene (cat#936206) Homo sapiens cDNA clone HFBCY1 7 j 


xo14h01 jrt NCI_CGAP_Ut3 Homo sapiens cDNA clone IMAGE:2703985 3' similar to SW:INT8 MOUSE 
Q64252 VIRAL INTEGRATION SITE PROTEIN INT-6. [1] ; ' 


| AV719382 GLC Homo sapiens cDNA done GLCCED1 2 5' j 


1 601 449201 F1 NIH_MGC_65 Homo sapiens cDNA clone IMAGE:3852961 5* ! 


'602035275F1 NCI_CGAP_Brn64 Homo sapiens cONA clone IMAGE:4183280 5' | 


< 

I 
E 

§ 
V) 

b 

CM 
© 

1 

Q. 

ca 

=s 

B 

<8 
C 
CD 

•g. 
3 

1 


VASCULAR ENDOTHELIAL GROWTH FACTOR B PRECURSOR (VEGF-8) (VEGF RELATED 
FACTOR) 


| Rattus norvegicus SynGAP-b mRNA, complete cds j 


| Rattus norvegicus SynGAP-b mRNA, complete cds [ 


In 

% 

ui 
O 

I 

CD 

s 

O 
< 

Z 
9 

! 

o 
£ 
o 
X 

CO 

m 

CL 

% 

o 
z 

1 


|601237139F1 NIH_MGC_44 Homo sapiens cDNA clone IMAGE:3609393 5' j 


|HISTIDINE-RICH GLYCOPROTEIN PRECURSOR | 


(HISTIDINE-RICH GLYCOPROTEIN PRECURSOR | 


| mucin [rats, Sprague-Dawley, sulfur-dioxide-treated tracheal epithelium, mRNA Partial, 390 nt] j 


1AV720408 GLC Homo sapiens cDN A clone GLCCSC1 2 5* f 


qJ92h11.xt NCLCGAP Brn25 Homo sapiens cDNA clone IMAGE:1881 125 3' similar to TR;Q29168 Q29168 
UNKNOWN PROTEIN ; 


qi62h11.x1 NCI_CGAP_Bm25 Homo sapiens cDNA clone IMAGE:1861125 3' similar to TR:Q29168 Q29168 
UNKNOWN PROTEIN ; 


xc27e08,x1 NCl_CGAP_Co18 Homo sapiens cDNA clone IMAGE:2585510 3' similar to TR:095154 095154 
AFLATOXIN B1 -ALDEHYDE REDUCTASE. ; 


ae85d11.s1 Stratagene scbijto brain S1 1 Homo 3apiens cDNA clone IMAGE:970965 3" similar to gb:Ml6038 
TYROSINE-PROTEIN KINASE LYN (HUMAN); 


| HeDcobacter pylori 26695 section 49 of 1 34 of the complete genome 


I Treponema pallidum section 4 of 87 of the complete genome i 


•S.tuberosum mRNA for induced stolon tip protein (partial) 1 


zl69a03.s1 Stratagene colon (#937204) Homo sapiens cDNA clone IMAGE:509836 3' 
HIV-1 isolate 081 07 v6 from USA, envelope glycoprotein (env) gene, partial cds 


Top Hit 
Database 
Source 


|NT 


|NT 


r 

LU 


IEST.HUMAN j 


EST HUMAN 


l 


|ESTHUMAN | 


|EST_HUMAN | 


|NT 1 


SWISSPROT 


INT I 


INT | 


UJ 


i 

k 

UJ 


(SWISSPROT | 


ISWISSPROT | 


INT I 


IESTHUMAN | 


i 

UJ 


EST HUMAN 


EST HUMAN 


EST HUMAN 


|NT | 


INT I 


INT | 


X 

UJ z 


Top Hit Acession 
No. 


IAEO0O218.1 


9630816| 


M860O8.1 | 


M86006.1 | 


AW591271.1 


AV719382.1 ! 


BE871 461.1 


BF337531.1 | 


11422099| 


P49765 


IAF058790.1 | 


IAF058790.1 


i 

e 

m 


|BE378707.1 | 


IP04929 | 


|P04929 j 


IS65019.1 I 


jAV720408.1 j 


AI198413.1 


CO 

3 

CO 

< 


AW080795.1 


AA776132.1 


|AE000571.1 j 


1 
1 

UJ 

< 


IZ11679.1 | 


AA056427.1 
AF1 12540.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


j 4.5E-01 1 


| 4.5E-01 1 


| 4.5E-01| 


o 

IU 
to 

V- 


4.5E-01 


4.5E-01| 


| 4.5E-01 1 


| 4.5E-01| 


o 

UJ 
to 


| 


| 4.4E-01| 


4.4E-01! 


I 4.4E-01| 




3 


S 


4.4E-0l| 


4.4E-01| 


4.4E-01 


4.4E-01 


4.4E-01 


4.4E-41 


i 4.4E-01 1 


i 


3 


4.4E-01 
4.4E-01 


Expression 
Signal 


s 

o" 


I 1-021 


CM 
CD 


CM 
CD 

a 


2.15 


I 1-521 


| 3.52| 


I 1.58| 


3.37 1 


3.39 


9 


o> 


2.92| 


1.88| 


CM 


CM 


1.59| 


CM 


1.46 




1.78 


1.42 


I 1-041 


CO 

o 


1 9.711 


0.84 
0.7 


ORFSEQ 
ID NO: 


| 34728[ 




! 36254 


| 36255 


36S99 










27847 


2871 9 1 


28720 1 


28723 1 




30797 | 


30 798 | 


31309| 


31328| 


31591 


31592 


31894 




33056| 






34500 
34896 


Exon 
SEQ ID 
NO: 


| 21588 


| 22491 


[ 23038 1 


| 23038[ 


23455 


| 23880| 


| 25384| 


| 248801 


! 24918| 


15109 


| 16070 | 


16070| 


16073| 


16950 | 


18137) 


18137] 


1B397) 


1B415| 


18651 


1 18651 


18923 


19010 


I 19980) 


o 

CM 


20436] 


21353 
21738 


Probe 
SEQ ID 
NO: 


| 8897 1 


| 9840 


[ 10392) 


| 10392 1 


10772 


| 11217| 


I 11895| 


| 12540 | 


| 1261 1 1 


CO 

8 


i 3310| 


| 3310| 


| 3313| 


| 4209| 


| S334| 


| 5334| 


| 5602| 


| 5619] 


5864 


I 


6146 


6236 


! 7297| 


\ 77231 


I 7740| 
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Top Hit Descriptor 


Human clabindin 27 gene, exons 10 and 1 1 , and L1 and Alu repeats j 


Porphyra purpurea mitochondrion, complete genome j 


Nicotiana tabacum mRNA for TATA binding protein (TBP), complete cds ( 


AV685974 GKC Homo sapiens cDNA clone GKCBQC1 1 4T | 


AV702623 ADB Homo sapiens cDNA clone ADBDBE06 5' | 


Homo sapiens proteoglycan 3 (PRG3) gene, complete cds j 


HOMEOBOX PROTEIN HLX1 j 


Homo sapiens hypothetical protein FU 10583 (FLJ 10583), mRNA | 


2 

DC 

E 

3 
a 

X3 

a. 

I 

CL 

1 
32 
c 

o 
a. 

§ 

1 

X 


Xyiella fastidiosa, section 1 6 of 229 of the complete genome j 


Ceanorhabditis briggsae acetylcholinesterase (ace-1) gene, complete cds | 


Arabfdcpsls thaliana putative c-myb-llke transcription factor (MYB3R-3) mRNA, complete cds „ J 


Mus musculus solute carrier family 1 , member 6 (Slc1a6), mRNA j 


Pleuronectes americanus aminopeptldase N (ampN) gene, partial cds J 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 30 | 


co 

IO 
IO 

to 

to 
co 

CM 

lij 

S 

s 
< 

Z 
n 
u 
w 
c 
a> 

t 
1 

o 
X 
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o 
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o 

H. 
_i 
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co 

CM 
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CO 
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vo 
co 

lo 

CO 
CM 
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O 

CO 

o 
o 

< 
z 
D 

« 
C 
CD 

a 

s 

o 

1 
X 

CO 

o 

CQ 

o 

h- 

_l 
tl_ 

z 

1 

to 

3 


PMO-HT0339-200400-010-G01 HT0339 Homo sapiens cDNA | 


Mus musculus general transcription factor II I (Gtf2i), mRNA j 


Taklfugu rubrlpes wnt2 (partial), franki , cftr and frank2 (partial) genes I 


TRANSCRIPTION FACTOR SOX-1 0 \ 


prion protein [mink, Genomic, 2446 nt] | 


QV3-BT0537-271299-049-e02 BT0537 Homo sapiens cDNA | 


ta54f11.x1 Soares_total_fetus_Nb2HF8_9w Homo sapiens cDNA clone IMAGE:2047917 3' similar to 
contains Alu repetitive element; 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 25 | 


M.musculus gene for kallikrein-binding protein | 


Mouse liver receptor homologous protein (LRH-1) mRNA, complete cds j 


Homo sapiens mRNA for K1AA1631 protein, partial cds | 


Homo sapiens FOS-like antlgen-1 (FOSL1), mRNA J 


Homo sapiens chromosome 21 segment HS21 C079 j 


ye43h06.ri Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:120539 5' similar to contains 
Alu repetitive efement;contalns PTR5 repetitive dement ; 


Top Hit 
Database 
Source 


NT I 


NT | 


NT I 


EST.HUMAN | 


ESTJHUMAN | 


NT | 


SWISSPROT | 


NT | 


NT | 


NT | 


NT | 


NT | 


NT 


NT S 


NT I 


EST__HUMAN | 


ESTJ-iUMAN | 


i 


NT | 


in! 


SWISSPROT | 


NT I 


EST_HUMAN | 


EST HUMAN 


NT | 


NT I 


NT • | 


NT I 


| 


in! 


i 


TopHitAcesslon 
No. 


8 
i 


11465620| 


D86722.1 | 


AV695974.1 [ 


i 

< 


AF304354.1 j 


Q61670 [ 


114333351 


7019488| 


AE003870.1 | 


U41848.1 [ 


AF214117.1 | 


6678002I 


AF043383.1 | 


<N 

CO 

5 
5 

5 


AI807219.1 ! 


AI807219.1 [ 


BE154OB0.1 j 


6754095| 


AJ271361.2 | 


Q04888 j 


S46825.1 | 


BE072399.1 ] 


AI374601.1 


AL161513.2 


X61 597.1 | 


CO 


AB046851.1 [ 


S 

CM 

5 


AL1 63279.2 | 


CO 

i 


Most Similar 
(Top) Hit 
BLAST E 
Value 


3.GE-01I 


3.9E-01 1 


9 

Hi 

to 


3.9E-01) 


3.9E-01 1 


3.9E-01 1 


3.9E-01 1 


S 

«o 


3.8E-01 1 


i 

co 


3.8E-011 


3.6E-01 1 


3.8E-01 1 


• 3.8E-011 


3.8E-01 1 


3.8E-01 ] 


3.8E-01 1 


o 

LLJ 

CO 

co 


3.8E-01| 


1 

to' 


3.8E-01 1 


3.8E-01 1 


9 

a 

CO 


3.8E-01 


3.8E-01] 


3.8E-01| 


3.8E-01 1 


3.8E-01 j 


3.8E-01 1 


3.8E-01) 


3.8E-01 


Expression 
Signal 


3.03| 


0.58| 


0.77| 


1.98| 


1.47| 


3.37| 


2.08| 


1.44| 


8.33 1 


1.03| 


1.29| 


1.62| 


3.96) 


1.39| 


7.98| 


0.79| 


1.22| 


1.161 


5 
d 


0.74| 


1.42| 


0.74| 


to 

LO 


4.58 


1.25] 


4.42] 


0.86| 


2.04| 


1.021 


1.28| 


3.55 


ORF SEQ 
ID NO: 


35635, 




35932] 




37674| 












2791 8| 


28027 1 


28092I 


28466 1 


28887 1 






29127] 


29287| 


29416] 


31221] 




32298] 


32614 


32527 [ 




34028] 


34289] 


34358 1 


34551 | 




Exon 
SEQ ID 

NO: 


22429 i 


22496| 


22714| 


2341 0| 


24344) 


25295| 


24581 | 


24891 | 


12971 | 


14601 | 


15178| 


15290 | 


156011 


15809| 


16233 | 


16283] 


16283] 


16492| 


I 


16788 1 


18320 | 


19021 | 




19579 


CM 

o 
to 

CO 


20093; 


20890 1 


21147) 


.21215] 


21408) 


22011 


o Q 


1 


| 9645| 


| 10066 1 


| 10722| 


| 11753| 


[ 11948j 


12066| 


12559 | 


CO 


1863] 


2460 1 


2576| 


i! 

CM O 


| 3043) 


3477 


| 3527] 


! 3541 | 


S 3739 I 


| 3897] 


| 4043] 


| 5522] 


[ 6247| 


| 6526| 


6662 


| 6840 I 


CO 

5 


] 8196] 


I 8455] 


| 6523 | 


| 8716) 


9461 
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Top Hit Descriptor 
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m 
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c 
o 

i 

Q 

o 

i 
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(0 
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E 
o 
X 

co 

1, 

CL 
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O 

o 
o 

2 
o 
to" 
o 

s 
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Danio rerlo homeobox protein (hoxbSb) gene, complete ods j 


RC5-HT021 8-1 B1 099-01 1-g02 HT0218 Homo eapiens cDNA j 


o 
1 

CD 

!2 
c 

CD 

In 
g 

CO 
CQ 
cxl 

< 

8 

E 
w 

i 


Rat leukocyte common antigen (L-CA) gene, exons 1 through 5 | 


EARLY E2A DNA-BINDING PROTEIN | 


EARLY E2A DNA-BINDING PROTEIN [ 


Human mRNA for K1AA0086 gene, complete cds j 


PM4-SN0012-O3Q400-001-a11 SN0012 Homo sapiens cDNA | 


zw79f03.rl Soares_testis_NHT Homo sapiens cDNA clone 1MAGE782429 5' similar to TR:G1 066935 
G1 066935 F10F2.1 ; 


Bos taurus peptide methionine sulfoxide reductase (msrA) mRNA, complete cds [ 


GLUCOSE-6-PHOSPHATE 1 -DEHYDROGENASE, CHLOROPLAST PRECURSOR (G8PD) | 


5 

LU 
CL 

C 

e 

CL 

CO 

8 

< 
2 

E 
to 
o 

8 

CO 


HISTIDYL-TRNA SYNTHETASE (HISTIDINE-TRNA LIGASE) (HISRS) j 


HISTIDYL-TRNA SYNTHETASE (HISTIDINE-TRNA LIGASE) (HISRS) j 


Homo sapiens tumor protein p53-blndlng protein, 2 (TP53BP2), mRNA | 


5 

•3 
» 

j 

8 

I 

X 

o 

9 
§ 

1 
o 

s 

S 


Rattus norvegicus Na-K-CI cotransporter (Nkcci ) mRNA, complete cds j 


Homo sapiens tyrosine kinase non-receceptor 1 (TNK1 ), mRNA j 


VOLTAGE-DEPENDENT N-TYPE CALCIUM CHANNEL ALPHA-1 B SUBUNIT (CALCIUM CHANNEL, L 
TYPE, ALPHA-1 POLYPEPTIDE ISOFORM 5) (BRAIN CALCIUM CHANNEL III) (Bill) 


XJaevis gene for albumin including HP1 enhancer j 


< 

2 

§ 
f 
i 

X 

«o 

I 
r- 

! 

CNJ 

Q 
O 

f 

1 

O 


C.griseus rhodopsin gene for opsin protein J 


Gallus gallus SPARC gene for osteonectin, promoter and exon 1 j 


Human breakpoint cluster region (BCR) gene, complete cds j 
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CL 
CO 
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1 

CM 
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Drosophila melanogaster dual bar protein (BarH2) gene, exon 1 | 


Human glucokinase (GCK) gene, repeat polymorphism | 


HA0542 Human fetal liver cDN A library Homo sapiens cDNA J 


B.taurus atpAl gene for F(0)F(1 ) ATP synthase alpha-subunit f 


Thermotoga maritima section 86 of 1 36 of the complete genome | 


Top Hit 
Database 
Source . 


EST_HUMAN | 




ESTJHUMAN | 




2 


SWISSPROT | 


SWISSPROT | 


y— 

2 


EST_HUMAN | 


ESTJHUMAN 




SWISSPROT | 


r- 

2 


SWISSPROT | 


SWISSPROT | 


r- 

2 


EST„HUMAN | 






SWISSPROT 


2 
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i 

i 






E2 


EST_HUMAN | 
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ESTJHUMAN I 
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AF071 253.1 | 


BE1 46585.1 | 


Y18477.1 | 


M1 8349.1 I 


Q96687 | 


CO 

s 

CO 

O 


D42045.1 | 


CD 

5 

8 
5 
< 


AA431 833.1 


U37150.1 j 


024357 


tn 
I 

s 


P47281 | 


P47281 


11448042] 


BF358871.1 


AF051 561.1 


4507610| 


Q02294 


Z26825.1 


BE1 74794.1 j 


X61 084.1 


AJ243178.1 I 


U07000.1 


N77597.1 


M82885.1 | 


L05145.1 


AI064773.1 


X64565.1 


AE001 774.1 


Most Similar 
(Top) Hit 
BLAST E 
Value 


3.5E-01| 


3.5E-01 1 


1 

CO 


3.5E-01 1 


3.5E-01] 


3.5E-01 1 


3.5E-01! 


3.5E-01 1 


3.5E-01 1 


3.5E-01 


3.5E-01 1 


3.5E-01 1 


3.5E-01 1 


3.5E-01 1 


3.5E-01] 


3.5E-01| 


3.5E-01) 


3.5E-01 1 


3.5E-01 1 


o 

\k 

CO 


3.5E-01 1 


3.5E-01 1 


1 

CO 


9 c 

Lo B 

CO c 


3.5E-01) 


3.5E-01) 


3.5E-01| 


3.5E-01| 
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CO 


3.5E-01) 


Expression 
Signal 
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d 


CO 

r- 

CM 


o» c 

CO c 
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CO 
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ORFSEQ 
ID NO: 




CD 

i 


29805| 


29995! 


30230 1 


cn 
co 


30886' 


31152! 




32070 


321 24 1 


32338 1 




33207I 


33208 1 




33790 I 




34662| 


35481 


35644 


35713| 


I 

CO 


36875 
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Top Hit Descriptor 


Beta vulgaris mitochondrion , complete genome J 


Mus musculus SIL, MAP J 7, CYP_a, SCL & CYP_b genes . j 


Homo sapiens HLA class III region containing tenascin X (tenascin-X) gene, partial cds; cytochrome P450 21- 
hydroxytase (CYP21B), complement component C4 (C4B) G1 1 , helicase (SKI2W). RD, complement factor B 
(Bf), and complement component C2 (C2) genes,> 


Rhtzobium leguminosarum sym plasmid pRLSJI nodXgene . j 


Rhizobium leguminosarum sym ptasmid pRL5JI nodX gene 1 


Arabidopsis thaHana DNA chromosome 4. contig fragment No. 45 j 


Homo sapiens KIAA1 1 00 protein (KIAA1 1 00), mRNA 


PROLINE-RICH PROTEIN LAS1 7 | 


e02184016T1 NIH_MGC_42 Homo sapiens cDNA clone IMAGE:4300251 3' j 


Human chromosome 1 5q1 1 -q1 3 putative DNA replication origin in the g-aminobutyric acid receptor b3 and a5 
gene cluster 


Mus musculus disintegrin 5 (Dtgn5), mRNA ( 


EST36722 Embryo. 8 week I Homo sapiens cDNA 5* end { 


Homo sapiens uridine monophosphate synthetase (orotate phosphoribosyl transferase and orotidlne-5'- 
decarboxyiase) (UMPS) mRNA 


Bacteriophage phl-Ye03-12 complete genome I 


Streptomyces argillaceus mithramycin biosynlhetic genes J 


Homo sapiens MTA1 -L1 gene, complete cds I 


EXODEOXYRIBONUCLEASE V BETA CHAIN j 


GENOME POLYPROTEIN [CONTAINS: N-TERMINAL PROTEIN (P1); HELPER COMPONENT 
PROTEINASE (HC-PRO); PROTEIN P3] 


Homo sapiens A kinase (PRKA) anchor protein 5 (AKAP5), mRNA | 


Arabidopsis thaBana DNA chromosome 4, contig fragment No. 10 J 


Hypoxylon fragiforme chltln synthase genie, partial cds J 


Rattus norvegicus DNA for regucalcin, partial cds J 


tp78b12j<1 NCl_CGAP_Ut3 Homo sapiens cDNA clone IMAGE:2205407 3' similar to gb:X57522 ANTIGEN 
PEPTIDE TRANSPORTER 1 (HUMAN); 


Synechocystls sp. PCC6803 complete genome, 22/27, 2755703-2868766 J 


|QVO-DT0047-1 70200-1 23-H08 DT0047 Homo sapiens cDNA ] 


R.norvegicus mRNA for 3'UTR of ubiquitin-Jike protein j 


R jiorveglcus mRNA for 3'UTR of ubiquitin-Iike protein | 
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Q12446 | 
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Top Hit Descriptor 


Arabidopsia thaliana DNA chromosome 4, contig fragment No. 61 | 


Fusarium poae virus 1 RNA2 putative RNA dependent RNA polymerase gene, complete cds | 


i 

o 

n 
» 

OJ 
CL 


LACTOSE PERMEASE (LACTOSE-PROTON SYMPORT) (LACTOSE TRANSPORT PROTEIN) | 


S.cerevlsiae chromosome II reading frame ORF YBR172C j 


EST369264 MAGE resequences, MAGD Homo sapiens cDNA j 


Botrytls cinerea strain T4 cDNA library under conditions of nitrogen deprivation | 


601 868804F1 N IH_MGC_1 7 Homo sapiens cDNA clone IM AGE:41 11512 5" j 


Mus musculus Pbx/knotted 1 homeobox (Pknoxl ), mRNA | 


Homo sapiens promyelocytic leukemia zinc finger protein (PLZF) gene, complete cds J 
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CN 
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X 


Homo sapiens synplekin (SYM) mRNA 


Rabbit beta-liks globin gene cluster encoding the epsilon, gamma, delta (pseudogene) and beta globin 
polypeptides, complete cds 


HYPOTHETICAL 81 .7 KD PROTEIN C13G7.04C IN CHROMOSOME I PRECURSOR j 


602081 972F1 NIHJv1GC_81 Homo sapiens cDNA clone IMAGE:4246505 5* j 


CYTADHERENCE HIGH MOLECULAR WEIGHT PROTEIN 3 (CYTADHERENCE ACCESSORY 
PROTEIN 3) (ACCESSORY ADHESIN PROTEIN 3) (P69) 


601465591 F1 NIH_MGC_67 Homo sapiens cDNA clone IMAGE:3868799 5' J 


CMO-HT0569-060300-269-f10 HT0569 Homo sapiens cDNA j 


Giardia intestinalis pyruvate:ffavodo»n oxidoreductase and flanking genes ] 


Fugu rubrlpes gamma-amlncbutync acid receptor beta subunitgene, partial cds; 55kd erythrocyte membrane 
protein (P55), synaptic vesicle-associated integral membrane protein (VAMP-1 ), procollagen C-proteinase 
enhancer protein (PCOLCE) genes, complete c> 


AV718037 FHTA Homo sapiens cDNA clone FHTAABH01 5' | 


Human mRNA for KIAA0361 gene, KIAA0361 protein | 


Homo sapiens partial LMOl gene for LIM domain only 1 protein, exon 1 j 


Rat ISO-atrial natriuretic factor gene, complete cds j 


Rattus norvegicus repeat; map NOS-D12Wox1 | 


H.sapiens gene fragment for acetylcholine receptor (AChR) alpha subunitexons 8, 9 and 3' flanking region 
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Arabldopsis thaliana DNA chromosome 4, contig fragment No. 70 | 
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AF047013.1 | 


Z50202.1 i 
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Z36041.1 | 


AW 9571 94.1 j 


AL1 11 655.1 I 


BF203817.1 | 


7710079] 


AF060568.1 | 


D10872.1 | 


4759195| 


M1 8818.1 


Q10268 


BF893617.1 | 


Q57081 


BE782748.1 | 


BE173964.1 | 


L27221.1 | 


AF01 6494.1 


AV718037.1 ] 


AB002359.1 [ 


AJ277661.1 I 


M60266.1 | 


AJ231 001.1 | 


X02508.1 


BF31 1635.1 ! 


AL1 61 574.2 


Most Similar 
(Top) Hit 
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Top Hit Descriptor 


RC3-BT0333-1 80700-1 1 1 -a03 BT0333 Homo sapiens cDNA j 


RC3-BT0333-1 80700-1 1 1 -a03 BT0333 Homo sapiens cDN A j 


Mus musculus 129/sv Clara cell 1 0 kd protein (mCC10) gene, complete cds j 


Mouse cytokeratin 1 5 gene, complete cds | 


Strongyiocentrotus purpuratus 34/67 kDa laminin-binding protein mRNA, partial cds ] 


Cantagab orthopoxvirus hemagglutinin gene, complete cds | 


Homo sapiens chromosome 21 segment HS21 C006 J 


Mus musculus midndin (Midn-pending), mRNA | 


Streptococcus pneumoniae strain DBL5 PspA (pspA) gene, partial cds j 


Thermotoga maritima section 67 of 136 of the complete genome | 


Mus musculus C-type (calcium dependent, carbohydrate recognition domain) lectin, superfamily member 9 
(ClecsfB), mRNA 


601339079F1 N1H_MGC_53 Homo sapiens cDNA clone IMAGE:3681594 5' { 


Streptomyces sulfonofaclens isopenicillin N synthase (pcbC) gene, partial cds j 


Homo sapiens DKFZP586M01 22 protein (DKFZP586M0122), mRNA f 


Anabaena PCC7120 cytosine-specific DNA methyltransferase(dmnB) gene, complete cds; putative 
enthranilaid phosphorlbosyltransferase gene, partial cds; and unknown gene 


RC2-BN0074-240400-1 1 0-h1 2 BN0074 Homo sapiens cDNA i 


602133271 F1 N1H_MGC_81 Homo sapiens cDNA clone IMAGE:4288336 5' j 


Actinobacillus acUnomycetemcomltans TadA (tadA), TadB (tadB), TadC (tadC), TadD (tadD), TadE (tadE), 
TadF (tadF), and TadG (tadG) genes, complete cds 
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Aspergillus oryzae bipA gene for ER chaperons BiP, complete cds [ 


602140133F1 NIH_MGC_46 Homo sapiens cDNA clone IMAG&4301097 5' | 


602140133F1 NlH_MGC_46 Homo sapiens cDNA clone IMAGE4301097 5' | 


yp84b10.r1 Scares fetal liver spleen 1NFLS Homo sapiens cDNA clone 1MAGE:194107 5' j 


Rattus norvegicus mRNA for glyceraldehyde-3-phosphate dehydrogenase type 2 (gapdh-2 gene) | 


Mus musculus ribose 5-phosphate isomerase A (Rpia), mRNA j 


Aquifex aeolicus section 68 of 1 09 of the complete genome I 


Chrysodidymus synuroldeus mitochondrion, complete genome ] 


PM1 -CT0326-1 71 299-001 -f 1 2 CT0326 Homo sapiens cDNA \ 
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Top Hit Descriptor 


Borrelia burgdorferi (section 66 of 70) of the complete genome j 


ov44g10xt SoaresJestis_NHT Homo sapiens cDNA clone IMA6E:1 640226 3* similar to contains Alu 
repetitive element-contains element MER22 repetitive element ; 


Mus musculus chromosome X contigA; putative Magea9 gene, Caltractln, NAD(P) steroid dehydrogenase 
and Zinc finger protein 185 


RNA POLYMERASE BETA SUBUNIT (LARGE STRUCTURAL PROTEIN) (L PROTEIN) j 


Hepatitis G virus isolate 60 (SZNAE12) polyproteln precursor, gene, partial cds | 


Bovine adenovirus 3 complete genome J 


602042601 F1 NCl_CGAP_Brn67 Homo sapiens cDNA clone IMAGE.41 801 29 5' | 


ql59c11.x1 Soaree_NhHMPu_Sl Homo sapiens cDNA clone IMAGE:1876628 3' similar to contains Alu 
repetitive elementjcontains element LTR5 repetitive element ; 


EST57072 Infant brain Homo sapiens cDNA 5' end ] 


Homo sapiens OCTN2 gene, complete cds j 
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2t41f01.r1 Soares ovary tumor NbHOT Homo sapfens cDNA done IMAGE: 724921 5* stmflar to contains Alu 
repetitive element; 


Bovine 680 bp repeated unit of 1 .723 satellite DNA j 


Mesembryanthemum crystallinum fructose-biphosphate aldolase mRNA, complete cds J 
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Marsileaquadrifolia ribulose-1 ,5-bisphosphate carboxylase/oxygenase large subunit (rbcL) gene, chloroplast 
gene encoding chloroplast protein, partial cds 


L.esculentum ypt2 mRNA for GTP-binding protein | 


qp48h01 .xl NCI_CGAP_Co8 Homo sapiens cDNA clone IMAGE:1 926289 3' similar to gb.X06323_cds1 
MITOCHONDRIAL 60S RIBOSOMAL PROTEIN L3 (HUMAN); 


qp48h01 jcI NCI_CGAP_Co8 Homo sapiens cDNA clone IMAGE:1 926289 3" similar to gb:X06323_cds1 
MITOCHONDRIAL 60S RIBOSOMAL PROTEIN L3 (HUMAN); 


Homo sapiens Ianosterol 14-alpha demethyiase cytochrome P450 (CYP51 ) gene, exon 5 ' j 


of02h05.s1 NCI_CGAP_Co12 Homo sapiens cDNA clone IMAGE: 1419993 3' similar to gb:M87789 IG 
GAMMA-1 CHAIN C REGION (HUMAN); 


602O22987F1 NCI_CGAP_Bm67 Homo sapiens cDNA clone IMAGE:41 58525 5* | 


Neurospora crassa negative regulator sulfur contrcller-2 (scon-2) gene, complete cds ( 


Lycopersicon esculentum peroxidase (TPX1 ) mRNA, complete cds j 


Escherichia coli translocated intimin receptor Tir (tir) gene, complete cds j 
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AF030154.1 


BF52818B.1 | 
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AA91 1629.1 


BF347847.1 
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L1 3654.1 


AF132728.1 
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(Top) Hit 
BLAST E 
Value 
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Top Hit Descriptor 


!601 12601 6F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE:2990043 6' | 


j Bacterlphage T2 DNA-(adenine-N6)methyl transferase (dam) gene, complete cds j 


Homo sapiens acetylcholinesterase collagen-like tail subunit (COLQ) gene, exons 1A, 2, 3, 4, and 5 


|EST371580 MAGE resequences, MAGF Homo sapiens cDNA j 


|QV1-BT0630-040400-132-e03 BT0630 Homo sapiens cDNA j 


| Enterococcus faecium strain N97-330 vanD glycopeptide resistance gene cluster, complete cds; and 
unknown gene 


Gallus gallus mRNA for skeletal myosin heavy chain, complete cds j 


Gallus gallus mRNA for skeletal myosin heavy chain, complete cds | 


aa89d07.r1 Stratagene fetal retina 937202 Homo sapiens cDNA clone 1MAGE:838477 5" 1 


Arabidopsls thallana PSI type III chlorophyll a/b-bindlng protein (Lhca3*1 ) mRNA, complete cds •. | 


Ophrestia radlcosa maturase-like protein (matK) gene, complete cds; chloroplast gene for chloroplast product 


Mus musculus metalloprotease dislntegrin (Adam28) mRNA, complete cds j 


jyj51e05.r1 Soares placenta Nb2HP Homo sapiens cONA clone IMAGE:1 52288 5' ( 


'Paramecium caudatum gene far PAP, complete cds | 


td16a03.x1 NCI_CGAP_Co16 Homo sapiens cDNA done IMAGE:2075788 3' similar to contains element 
MER35 repetitive element ; 


Homo sapiens protein translocase, JM26 protein, UDP-galactose translocator, pim-2 protooncogene homolog 
pim-2h, and shal-type potassium channel genes, complete cds; JM12 protein and transcription factor IGHM 
enhancer 3 genes, partis! cds; and unknown g> 


|Thermotoga maritima section 123 of 136 of the complete genome j 


ts02e12.x1 NCI_CGAP_Pan1 Homo sapiens cDNA done IMAGE :2227 438 3' similar to SW:NDF1_RAT 
Q64289 NEUROGENIC DIFFERENTIATION FACTOR 1 ;oontalns element LTR1 repetitive element ; 


ts02e1 2.x1 NCI_CGAP_Pan1 Homo sapiens cDNA done IMAGE:2227438 3' similar to SW :NDF1_RAT 
Q64289 NEUROGENIC DIFFERENTIATION FACTOR 1 ;contalns element LTR1 repetitive element ; 


j Neisseria meningitidis serogroup A strain Z2491 complete genome; segment 6/7 | 


(601581 754F1 NIH_MGC_7 Homo sapiens cDNA clone (MAGE:3936156* 5' | 


|601581754F1 NIH_MGC„7 Homo sapiens cDNA clone IMAGE:3936156 5' j 


Wd48c04.x1 Soares_NFL_TGBC_S1 Homo sapiens cDNA clone IMAGE:2331366 3' similar to gb;M37721 
PEPTIDYL-GLYCINE ALPHA-AMIDATING MONOOXYGENASE PRECURSOR (HUMAN); 
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Top Hit Descriptor 


Rattus norvegicus mRNA for acid gated ton channel j 


P leurodeles wall! dlstaHess like protein PwDIx-3 (PwDIx-3) mRNA, complete cds | 


Rattus norvegicus mRNA for acid gated ton channel | 


nac39h12Jd Lupaki_sciatic_nerve Homo sapiens cDNA clone IMAGE:3395950 3' similar to contains element 
MER38 repetitive element ; 


oz14a10.x1 SoaresJetalJiver_spleenJNFLSjS1 Homo sapiens cDNA clone IMAGE:1675290 3' similar to 
TR:Q13040 Q13040 ATP-BINDING CASSETTE PROTEIN ; 


Homo sapiens PPAR delta gene, promoter region | 


Fresh-water sponge Emf1 alpha collagen (COLF1) gene | 


602086608F1 NIH_MGC_83 Homo sapiens cDNA cfone IMAGE:4249969 5' | 
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PM2-HT0353-2B1 299-003-812 HT0353 Homo sapiens cDNA j 


Homo sapiens FRA3B common fragile region, diadenosine triphosphate hydrolase (FHIT) gene, exon 5 


Arabldopsls thailana DNA chromosome 4, conttg fragment No. 62 | 


Xlphophorus maculatus truncated Rexl retrotransposon reverse transcriptase (RT) pseudogene | 


Mus musculus breast/ovarian cancer susceptibility protein (BRCA1) mRNA, complete cds ] 


Mus musculus mixed lineage kinase 3 (MIk3) and two pore domain K+ channel subunit (Kcnk6) genes, 
complete cds 


Mus musculus MAP kinase kinase kinase 1 (Mekkl ) mRNA, complete cds j 


Mus mu3cuki3 MAP kinase kinase kinase 1 (Mekkl ) mRNA, complete cds j 


Human scRNA (BC200 beta) pseudogene j 


Human scRNA (BC200 beta) pseudogene j 


Human beta-cytoplasmic actin (ACTBP9) pseudogene ] 


zq87c05.r1 Stratagene hNT neuron (#937233) Homo sapiens cDNA clone IMAGE:648968 5' j 


Mus musculus vinculin gene, exon 3 ; 


histamine H2-receptor [rats, Genomic, 1928 nt] j 


Vidua chatybeata mitochondrion, complete genome j 


Homo sapiens diaphanous (Drosophlla, homolog) 2 (DIAPH2), transcript variant 1 56, mRNA | 


Synechocystis sp. PCC6803 complete genome, 1 9/27, 2392729-2538999 I 


Gallus gallus T-box containing protein (Ch-TbxT) mRNA, comptete cds j 


Gallus gallus T-box containing protein (Ch-TbxT) mRNA, complete cds j 
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Top Hit Descriptor 


| Marsupial cat beta-globin gene mRNA, partial cds | 


| Marsupial cat beta-globin gene mRNA, partial cds 3 


ol96g10.s1 NCI_CGAP_PNS1 Homo sapiens cDNA clone I MAGE:1 537506 3' similar to contains Alu 
repetitive element; 


|RC5-ET0082-060700-022-A02 ET0082 Homo sapiens cDNA | 


|RC5-ET0082-060700-022-A02 ET0082 Homo sapiens cDNA | 


| Arabidopsis thaliana DMA chromosome 4, contig fragment No. 1 5 | 


| Arabidopsis thaliana DNA chromosome 4, contig fragment No. 1 5 j 


Homo sapiens calcium channel alphalE subunlt (CACNA1 E) gene, exons 7-49, and partial cds, alternatively 
spliced 


o!98f02.s1 NCLCGAP J>NS1 Homo sapiens cDNA clone I MAGE: 1537467 3' similar to gb:L21696_cds1 
PROTHYMOSIN ALPHA (HUMAN);contaIns element OFR repetitive element ; 


Ol96f02.s1 NCI_CGAP_PNS1 Homo sapiens cDNA clone IMAGE: 1637467 3' similar togb:L21696_cds1 
PROTHYMOSIN ALPHA (HUMAN);contains element OFR repetitive element ; 


j Rattus norveglcus sodium channel I mRNA, complete cds j 


| Homo sapiens partial 5-HT4 receptor gene, exons 2 to 5 j 


| Influenza A/Guangdong/243172 nucleoproteln (seg 5) gene, 5' end | 


|Mus musculus ATP-binding cassette 1 , sub-family A, member 1 (Abcal) gene, complete cds ] 


jDrosophila melanogaster clathrin light chain mRNA, complete cds j 
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| Mus musculus Cctg gene for chaperonln containing TCP-1 gamma subunlt, partial cds j 


Homo sapiens calcium channel, voltage-dependent, beta 2 eubunit (CACNB2) mRNA, and translated 
products 


|Oryzlas latipes gene for membrane guanylyl cyclase OIGC1 , complete cds | 
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|Dlctyostelium discoideum pla3mid Ddp5, complete genome j 


|Yersinia peslis pjasmid pCD1 j 


| Mus musculus guanytate nucleotide binding protein 1 (Gbp1 ), mRNA [ 
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|Homo sapiens latent transforming growth factor beta binding protein 4 (LTBP4) mRNA | 


qg22d10.x5 NCI_CGAPJCd3 Homo sapiens cDNA clone IMAGE:176181 1 3' similar to TR:075936 075936 
GAMMA BUTYROBETAINE HYDROXYLASE ; 
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yj46e01.s1 Soares placenta Nb2HP Homo sapiens cDNA clone 1MAGE:151704 3' simfiar to contains Alu 
repetitive element; 






Mus muscuius Scya6, Scya9, Scyal 6-ps, Scya5 genes for small inducible cytokine A6 precursor, small 
inducible cytokine A9 precursor, Scyal 6 pseudogene, small Inducible cytokine A5 precursor, complete cds 
































Top Hit Descriptor 


Mus muscuius Scya6, Scya9, Scyal 6-ps, Scya5 genes for small Inducible cytokine A6 precursor, sma 
inducible cytokine A9 precursor, Scyal 6 pseudogene, small inducible cytokine A5 precursor, complete 
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Jonopsidium acauie LEAFY protein (LEAFY2) gene, partial cds 
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QV0-BN0041-O703O0-147-CO4 BN0041 Homo sapiens cDNA 


yj45e01.s1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE;151704 3" similar to contains / 
repetitive element; 


Bovine NB25 mRNA for MHC class II (BoLA-OQB), complete cds 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 56 


S.tuberosum mRNA for alcohol dehydrogenase 


MR3-ST0203-1 51 299-1 12-g0e ST0203 Homo sapiens cDNA 


an28g07.y5 Gessler Wilms tumor Homo sapiens cDNA clone IMAGE:1 700028 5' 
Mesocricetus auratus Na-taurocholate cotransporting polypeptide mRNA, partial cds 


Arabidopsis thaliana DNA chromosome 4, contig fragment No. 90 

ytf38h08.r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:264063 5' 


Mus muscuius Tnf receptor-associated factor 6 (Traf6), mRNA 


Mus muscuius Tnf receptor-associated factor 6 (Traf8), mRNA 
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Citrullus lanatus mRNA for wsus, complete cds 


Citrullus [anatU3 mRNA for wsus. complete cds 


Bacillus halodurans genomic DNA, section 5/14 


Human cellular DNA/Human papillomavirus proviral DNA 


Bacteriophage Ike, complete genome 


nh02a05.s1 NCI_CGAPJ"hy1 Homo sapiens cDNA clone 1MAGE:943088 similar to contains L1 .t3 L1 
repetitive element ; 
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Top Hil Descriptor 


tx69g05.x1 NCI_CGAP_Ut1 Homo sapiens cDNA clone IMAGE:2274872 3" similar to gb:M73779 RETINOIC 
ACID RECEPTOR ALPHA-1 (HUMAN); 


Human beta globln region on chromosome 1 1 | 


Homo sapiens meveJonate kinase gene, exon 6 and 7 | 
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Homo sapiens homeobox protein OTX2 gene, complete cds j 


AXON1N-1 PRECURSOR (AXONAL GLYCOPROTEIN TAG-1 ) j 


H.eapiens mRNA for novel T-cell activation protein j 


Homo sapiens mRNA for KIAA1 308 protein, partial cds | 


Homo sapiens cytochrome P450 3A4 (CYP3A4) gene, promoter region J 


Homo sapiens cytochrome P450 3A4 (CYP3A4) gene, promoter region j 


Populus trichocarpa cv. Trichobel ABI3 gene j 


CO 

0> 

a> 
co 

s 

1 

6 
= 

I 


Vibrio cholerae chromosome II, section 70 of 93 of the complete chromosome | 


Homo sapiens apelin gene, complete cds j 


EST380677 MAGE resequences, MAGJ Homo sapiens cDNA j 


Mus mueculus chaperonin subunit 3 (gamma) (Cct3), mRNA i 


MICRONUCLEAR LINKER HISTONE POLYPROTEIN (MIC LH) [CONTAINS: LINKER HISTONE 
PROTEINS ALPHA, BETA, DELTA AND GAMMA] 


J84h08.s1 Stratagene colon (#837204) Homo sapiens cDNA clone IMAGE:51 1361 3' similar to TR:E221955 
E221955 38,855 BP SEGMENT OF CHROMOSOME XIV. ; 


Lvcopersicon esculentum Rsal fragment 2, satellite region j 


Plasmodium falciparum (strain Dd2) variant-specific surface protein (var-1) gene, complete cds ] 


xm43f01 jcI NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2686969 3' similar to TR:075984 075984 
HYPOTHETICAL 127.6 KD PROTEIN ; 


xm43f01 .xl NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2686969 3' similar to TR:075984 075984 
HYPOTHETICAL 127.6 KD PROTEIN ; 


Rattus norveglcus CCMT/enhancer binding protein epsllon (cebpe) gene, complete cds | 


RC3-BN0034-31 0800-1 13-N01 BN0034 Homo sapiens cDNA j 


601 809725R1 N!H_MGC_1 8 Homo sapiens cDNA clone IMAGE:4040335 3' j 


601809725R1 NIH _MGC_18 Homo sapiens cDNA clone IMAGE:4040335 3' j 
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th92b1 2.X1 Soares_NSF_F8_9W_OT_PA_P_S1 Homo sapiens cDNA clone IMAGE:21261 1 1 3' similar to 
TR:O02710 002710 GAG POLYPROTEIN. ; 
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Top Hit Descriptor 


601 193523F1 NlH_MGC_7 Homo sapiens cDNA clone IMAGE:3537581 5' 


QV1-UM0036-080300-103-d09 UM0036 Homo sapiens cDNA 
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UI-H-BIO-aat-c-09-O-Ul.s1 NCl_CGAP_Sub1 Homo sapiens cDNA clone IMAGE:2710289 3' 


Oryctolagus cunlculus fructose 1,6, bisphosphste aldolase (AldB) gene, complete cds 


ql90b12Jfl Soares_NhHMPu_S1 Homo sapiens cDNA clone IMAGE: 1 879583 3* 


AV659047 GLC Homo sapiens cDNA clone GLCFSH06 3' 


EST178192 Colon carcinoma (HCC) cell line Homo sapiens cDNA 5' end 


df58b03.y1 Morton Feta! Cochlea Homo sapiens cDNA clone IMAGE:2487485 5' 


yi10h05.r1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE: 138873 5' 


yi10h05.M Soares placenta Nb2HP Homo sapiens cDNA cksna 1MAGE:1 38873 5' 


601895465F1 NIH_MGCJ9 Homo sapiens cDNA clone IMAGE:41 24824 5' 


zd94a04.M Soares_fetal_heart_NbHHl9W Homo sapiens cDNA clone IMAGE:357102 5' similar to • 
element KER repetitive element ; 
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M.vannieJii genes rpoH, rpoB and rpoA 


Homo sapiens PHEX gene 


Homo sapiens PHEX gene 


Drosophila melanogaster signal transducting adaptor protein (STAM), serine threonine kinase lal (1A1 
zinc finger protein (DNZ1 ) genes, complete cds 


C.perfringens ORF for putative membrane transport protein 


Macromitrium levatum small ribosomal protein 4 (rps4) gene, chloroplast gene encoding chloroplast 1 
partial cds 


df29h08.y1 Morton Fetal Cochlea Homo.sapiens cDNA clone IMAGE:2485094 5' 


df29h08.y1 Morton Fetal Cochlea Homo sapiens cDNA clone IMAGE:2485094 5* 


MR3-ST021 8-21 1299-013-a08 ST0218 Homo sapiens cDNA 


MR3-ST021 8-21 1299-0 13-aOB ST0218 Homo sapiens cDNA 


yd47d03,r1 Soares fetal liver spleen 1NFLS Homo 3apien3 cDNA clone IMAGE:111365 6" 


Bacilus eubtilis complete genome (section 14 of 21): from 2699451 to 2812870 


oa99a03.sl NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE: 1320364 3' 
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Top Hit Descriptor 


Botrytis cinerea strain T4 cDNA library under conditions of nitrogen deprivation j 


AV71 2467 DCA Homo sapiens cDN A clone DCAAFF05 5' | 


Homo sapiens adapter protein CMS mRNA, complete cds j 


Mus muoculus procollagen, type XI , alpha 1 (Coll 1 e.1 ), mRNA j 


Botrytis cinerea strain T4 cDNA library under conditions of nitrogen deprivation j 
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RC4-ST0173-191099-032-d12ST0173 Homo sapiens cDNA | 


Archaeoglobus fulgldus section 91 of 1 72 of the complete genome j 


Carassius auratus keratin type I mRNA, complete cds j 


Homo sapiens chromosome 21 segment HS21C007 . | 


Bovine branched chain alpha-keto acid dihydrolipoyi transacytase mRNA, complete cds j 


Arabidopsis thaliana DMA chromosome 4, contig fragment No. 77 j 


Bacteriophage SPBc2 complete genome 


QV3-DT00 18-081 299-036-a03 DT0018 Homo sapiens cDNA j 


Schistosoma mansoni fructose bisphosphete aldolase mRNA, complete cds ] 
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AV752279 NPD Homo sapiens cDNA clone NPDA2E02 5' | 


AV752279 NPD Homo sapiens cDNA clone NPDAZE02 5' \ 


Homo sapiens chromosome 21 segment HS21C08O j 


Bovine branched chain alpha-keto acid dihydrolipoyi transacytase mRNA, complete cds 


601 126096F1 NIH_MGC_9 Homo sapiens cDNA clone IMAGE:2990063 5' ( 


RC4-TN0077-1 80900-012-c05 TN0077 Homo sapiens cDNA ( 


ha07b06.x1 NCLCOAP_Kld12 Homo sapiens cDNA clone IMAGE:2872979 3' similar to contains L1.M LI 
L1 repetitive element ; 


QVO-UM0093-100400-1 89-a06 UM0093 Homo sapiens cDN A | 


Emericella nidulans DNA-dependent RNA polymerase II RPB1 40 (RPB2) gene, partial cds J 


Hepatitis C virus 68_CL10 gBnome poiyprotein gene, partial cds j 


601874591F1 NiH_MGC_54 Homo sapiens cDNA clone IMAGE;4101119 5' | 


602039337F2 NCI_CGAP_Bm67 Homo sapiens cDN A clone IMAGE:41 77233 5' | 


602039337F2 NCI_CGAP_Bm67 Homo sapiens cDNA clone IMAGE:41 77233 5' j 


C.jacchus intron 4 of visual pigment gene (red allele) j 


26f3 Human retina cDNA randomly primed subtlbrary Homo sapiens cDNA 3 
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Top Hit Descriptor 


W heat mRNA for a group 3 late embryogenesie abundant protein (LEA) j 


P.clarkii mRNA; repeat region (ID 2MRT7) | 


P.clarkii mRNA; repeat region (ID 2MRT7) J 


L.esculentum mRNA for glyoxalase-l j 


Rana ridibunda pituitary adenylate cyclase-acBvating polypeptide variant 2 precursor, mRNA, complete cds, 
alternatively spliced 


ny63c04.s1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE: 1 282950 3' j 


Homo sapiens calcium channel alphalE subunit (CACNA1 E) gene, exons 7-49, and partial cds, alternatively 
spliced 
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i 6020231 12F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:41 58386 5' | 


1 JC virus agnoproteln, VP2, VP3, VP1 , large T antigen, and small t antigen genes, complete cds \ 
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wc99g03.x1 NCI CGAP Co3 Homo sapiens cDNA clone IMAGE:2325804 3' similar to SW :GST2 HUMAN 
Q99735 MICROSOMAL GLUTATHIONE S-TRANSFERASE II ; 


I NADH-UBIQUINONE OXIDOREDUCTASE B22 SUBUNIT (COMPLEX I-B22) (CI-B22) j 
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Staphylococcus aureus plasmid pSK23 putative recombinase Sin (sin) gene, partial cds; and transcriptional 
regulator QacR (qacR) and multidrug efflux protein QacB (qacB) genes, complete cds 


| N.crassa vacuolar ATPase 57-Kd subunit (vma-2) gene, complete cds j 


j N.crassa vacuolar ATPase 57-Kd subunit (vma-2) gene, complete cds j 


jHomo sapiens Xq pseudoautosomal region; segment 2/2 ( 


| Haemophilus influenzae Rd section 29 of 163 of the complete genome j 
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M.musculus DNA fragment of Apolipoprotein B gene | 


S.cerevisiae HXT5 gene | 


AV710857 Cu Homo sapiens cONA clone CuAAKE08 5' j 
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Yeast MPT5 gene for suppressor protein, complete cds j 


601655578R1 NIH__MGC_65 Homo sapiens cDNA clone IMAGE:3846283 3' ] 


601900763F1 N1H_MGC_19 Homo sapiens cDNA clone IMAGE:4130103 5' | 


Homo sapiens dynein intermediate chain DNAI1 (DNA11 ) gene, exon 1 7 j 


yf80c02.s1 Soares Infant brain 1 NIB Homo sapiens cDNA clone 1MAGE:28880 3' j 


Rabbit glycogen-associated protein phosphatase regulatory subunit (RG1) mRNA, complete cds I 


AV658033 GLC Homo sapiens cDNA clone GLCFIB12 3' j 


Homo seplens Xq pseudoautosomal region; segment 2/2 j 


MACROPHAGE-STIMULATING PROTEIN RECEPTOR PRECURSOR (MSP RECEPTOR) (P185-RON) 
(CDW 136) (CD1 38 ANTIGEN) 


Drosophlla mdanogaster strain Oregon R potential RNA-binding protein gene, complete cds; and syntaxin 
gene, partial cds 


R.norvegicus NF68 gene for Q8kDa neurofilament j 


QV4-BT0234-1 1 1 199-031 -g10 BT0234 Homo sapiens cDNA [ 
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Xenopus laevis integrin alpha 3 subunit mRNA, partial cds J 


CYCUN T ! 


Vibrio choleras chromosome II, section 85 of 93 of the complete chromosome [ 


Bacillus subtills complete genome (section 1 5 of 21): from 2795131 to 3013540 ) 


Mus musculus protein (1 6kDa) similar to human SYK Interacting protein (p1 6K), mRNA | 


tn18d08.x1 N CI_CGAP_Brn25 Homo sapiens cDNA clone 1MAGE:2167983 3' | 


nm08g1 1 ,s1 NCl_CGAP_Co10 Homo sapiens cDNA clone IMAGE:1059620 3' similar to gb:X069B5 rnal 
HEME OXYGENASE 1 (HUMAN); 


602129847F1 NIH_MGC_58 Homo sapiens cDNA clone IMAGE:4286771 5' ] 


Arabldopsie thaliana DNA chromosome 4, contig fragment No. 60 | 


EST384142 MAGE resequences, MAGL Homo sapiens cDNA | 


SynechocystJs sp. PCC6803 complete genome, 23/27, 2868767-3002965 } 


AU140363 PLACE2 Homo sapiens cDNA clone PLACE2000403 5' 


Mus musculus pre T-ceU antigen receptor alpha (Ptcra), mRNA t 
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X53981.1 


BE061418.1 ) 


1 


LIOW.I 


1 
§ 


i 

o 
o 


Z83118.1 


9845282' 


1 


to 
in 


BF697308.1 | 


AL161560.2 


AW972158.1 


D64004.1 


AU140363.1 


67552151 
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(Top) Hit 
BLAST E 

' Value 


1.2E-01| 
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s 

cm 
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UJ 

CN 


2 

CM 


3 

CM 
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1 
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3 

CM 


o 

CM 


1 


I 
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i 
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2 


3 


3 


1.1E-01] 


1.1E-01| 


Expression 
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co 

3 
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2.1 7 1 


3.18] 


1.58| 


2.61 1 


1.65) 


1.67) 


2.53 1 


3.52) 


2.B7 


3.16 


2.11 1 


1.44] 


5.86| 


s 


7.95| 


1.39] 


1.81| 


. 1.38) 
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d 


1.65 


CO 

3 


CO 


CM 
CO 
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1.85) 


ORF SEQ 
ID NO: 




35252) 


35750 1 


36543) 








co 

CO 


37238) 










m 

1 

•o 






30611 | 


31007| 






30989] 




30614] 


3 

s 

CM 


26010 


26452 | 




25552| 


26648| 


26943 | 




Exon 
SEQID 
NO; 


21315| 


22080j 


22555 1 


23305| 


23484| 


23672] 


23764| 


s 

CO 


23944 | 


24093 1 


s 

T 
CM 


24683) 




o 


24796 


24872] 


Tf 

CO 

co 


24917) 


24932' 


25289 


24982 


16228 


25372 


13334| 


13379 


CO 


13821 1 


15561' 


139771 


14257 


15037| 


Probe 
SEQID 
NO: 


| 8623| 


I 9471 1 


9066 | 


( 1061 1 1 


| 10801 1 
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§ 


1 
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CM 


CO 

1 
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1 
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WO 01/57275 



PCT/US01/00667 



ii 

ID 
I 



T 



8 | 

5 I 



If 

I o 

i § 
» © 



8) i 

< 

z: 

"8 

D 
§ 



S3 
ils 



| 

I : 



CO 



S3 



|i,iu 

ll 



o .. 

Ill o 



o 



188 



114/536 
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PCT/US01/00667 



Top Hit Descriptor 


Drosophila melanogaster ftz gene j 


801085354F1 NIH_MGCJ0 Homo sapiens cDNA.clone IMAGE:3451933 5' I 


Bacillus habdurans genomic DNA, section 1/14 J 


Drosophila melanogaster cAMP-dependent protein kinase type II regulatory subunit (pka-Rll) mRNA, 
complete cds 


601070219F1 N1H_MGC_12 Homo sapiens cDNA clone IMAGE:3456365 5' | 


601070219F1 NIH_MGCJ2 Homo sapiens cDNA clone 1MAGE:3456365 5* | 


Homo sapiens neurexln Ill-alpha gene, partial cds j 


zu46c03.x5 Scares ovary tumor NbHOT Homo sapiens cDNA clone IMAGE:740932 3* | 


7d77c1 2.x1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE:3278998 3' | 


Aspergillus terreus BSD mRNA for blasticidtn S deaminase, complete cds | 


xd43c09.x1 NCI_CGAP_Ov23 Homo sapiens cDNA clone IMAGE:2596528 3' similar to contains Alu 
repetitive element;contains element MIR MIR repetitive element ; 


xd43c09.x1 NCl_CGAP_Ov23 Homo sapiens cDNA clone IMAGE:2596528 3' similar to contains Alu 
repetitive olement;contains element MIR MIR repetitive element ; 


Mus musculus phospholipid transfer protein (PItp), mRNA ] 


O. saflva RAmy3C gene for alpha-amylase | 


Homo sapiens 1 factor (complement) (IF) mRNA | 


Daucus carota leucoanthocyanidin dioxygenase 2 (LDOX) mRNA, LDOX-2 allele, complete cds j 


Leptosphaeria maculans beta-tubulln mRNA, complete cds | 


Leptosphaeria maculans beta-tubutin mRNA, complete cds j 


Human HPTP delta mRNA for protein tyrosine phosphatase delta j 


<o 

CM 
c 

§ 

1 

0 

1 
a 

to 

c 
c 

I 

§ 

E 

X 


601460793F1 NIH_MGC_66 Homo sapiens cDNA clone IMAGE .3864287 5' 
Rattus norveglcus microtubulo-associated protein tau (Mapt), mRNA 


Aloe arboresoens mRNA for NADP-mailc enzyme, complete cds 

Homo sapiens fibroblast growth factor receptor 3 (achondroplasia, thanatophoric dwarfism) (FGFR3) mRNA 


< 
z 

w 

u 

« 

0 
X 
<o 

u> 
0 

I- 

X 

s 

A 

CO 

9 
0 

§ 
I 
X 
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CELL SURFACE A33 ANTIGEN PRECURSOR (GLYCOPROTEIN A33) | 


Caulobacter crescentus thymydilate kinase (tmk) and DNA polymerase III delta prime subunit (dnaC) genes, 
complete cds 


Top Hit 
Database 
Source 


NT | 


EST_HUMAN I 


NT I 


NT 


ESTJHUMAN | 


ESTJHUMAN | 


NT | 


EST_HUMAN | 


EST_HUMAN | 


NT | 


EST HUMAN 


EST HUMAN 


NT | 


NT | 


NT | 


NT I 


NT | 


NT I 


1N| 


NT | 


X 

w i- 

Ui z 




EST_HUMAN | 


SWISSPROT I 


NT 


c 

.2 

1 

<i 

X 
a. 


S 

I 


BE537719.1 I 


AP001507.1 I 


AF274008.1 


BE545554.1 | 


BE545554.1 j 


AF099810.1 


s 

< 


BE674249.1 


D83710.1 | 


AW 103088.1 


AW103088.1 


6755111| 


X56338.1 


4504578 [ 


AF1 84274.1 


AF257329.1 | 


AF257329.1 


X54133.1 | 


M61 943.1 


BF037421.1 

8393751 


0 

CO 

0 

i • 

0 


BE1 68660.1 


in 

1 

O 


AF099189.1 


Most Similar 
(Top) Hit 
BLAST E 
Veiue 
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o 
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1.0E-01I 


UJ 
O) 

o> 


CM 

9 

Ui 

CD 
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UJ 
O) 
CD 


UI 

CD 

o> 


9.9E-02| 


UJ 

O) 

oi 
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CM 

9 

UJ 
CO 
O) 


9.9E-02 


9.9E-02I 


9 

LU 

CO 

a 


? 

Ui 

CO 
CD 


9.8E-02| 


$ 

LU 

00 

CD* 


9.8E-02| 


9.8E-02| 


CM 

9 

LU 
co 

cri 


9.8E-02 
9.8E-02 


9.7E-02 
9.7E-02 


9.7E-02| 


9.7E-02j 


9.7E-02 


Expression 
Signal 


2.22| 


2.74 


?2 


1.27 


1.53' 


1.53, 


1.32| 


0.75| 


0.93 i 


9.17| 


0.93 


0.93 


0.98| 


1.43 1 


CD 

6 


CO 




6.24I 


0.77| 


1.1 8 | 


S £ 

CM" v" 


1.31 
1.49 


2.0B| 


3.48 1 


0.94 


ORFSEQ 
ID NO: 








28224 


28235| 


28236 1 


28871 | 




3O003| 


30547| 


33634 


33635 


35055 | 






28550 I 


29564 | 


29565 | 






36436 


26752 


27724] 




30595 


Exon 
SEQ ID 

NO: 


24738| 


249211 


25002 


15486 


154951 


15495^ 


16022| 


s 
s 


17367] 


17951 | 


20510 


2051 0 


21 887 1 


13333 | 


15865| 


9069 V j 


16939| 


CO 


20061 | 


21884! 


23204 
24570 


14077 
14326 


14B84| 


16714| 


18067 


Probe 
SEQ ID 

NO: 


| 12317 


1 126141 


I 12733' 


2781 


| 2790 


| 2790 1 




f 3933 | 


4632 


| 6875 | 


7815 


7815 


| 9156| 


s 

10 


1 3100] 


I 3142| 


1 4198| 


| 4198| 


| 7381 1 


S 91 53 | 


11437 
12052 


1328 
1680 


I 2257| 


| 3965| 


5261 1 



116/536 



01/57275 



PCT/US01/00667 



Top Hit Descriptor 

i 


Caulobacter crescentus thymydilaie kinase (tmk) and DNA polymerase III delta prime subunit (dnaC) genes, 
complete cds 


EST366546 MAGE resequences, MAGC Homo sapiens cDNA \ 


Baoillus 3ubtilis complete genome (section 1 6 of 21 ): from 2997771 to 321 341 0 I 


yw41 c03.s1 W elzmann Olfactory Epithelium Homo sapiens cDNA clone IMAGE:254788 3' j 


yw41c03.s1 Weizmann Olfactory Epithelium Homo sapiens cDNA clone IMAGE:254788 3" J 


wx78b06.xi NCI_CGAP_Ov38 Homo sapiens cDNA clone IMAGE2549747 3' similar ta gb:X52851_rna1 
PEPTIDYL-PROLYL CIS-TRANS ISOMERASE A (HUMAN); 


Mus musculus ligatin (Lgtn) mRNA, partial cds ' | 


co 

s 
E 

CO 

ui 

0 

1 

O 

u 

< 

Z 
Q 
o 
a 

s 

1 

§ 

o 
X 

CO 

a 

X 

z 

s ! 

s 

CO 
X 

j5 
1 


oz47d1 1 .xl Soares_NhHMPu_S1 Homo sapiens cDNA clone IMAGE:1678485 3' I 


Proteus mlrabllls flmbrlal operon, strain HI4320 1 


EST378303 MAGE resequences, MAGI Homo sapiens cDNA [ 


601498088F1 NIH_MGC_70 Homo sapiens cDNA clone IMAGE:3900165 5' | 


AU1 37084 PLACE1 Homo sapiens cDN A clone PLACE1 005740 5' I 


AV687898 GKC Homo sapiens cDNA clone GKCAAH02 5' [ 


lb 
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o> 

5 

CO 

lii 
O 
< 

» 

o 
o 

< 

Z 
O 
o 

c 

a. 

8 

o 
o 
s 

T. 

z 

LU 

§ 

o 

§ 


Homo sapiens DMBT1 candidate tumour suppressor gene, exons 1 to 55 \ 


Homo sapiens DMBT1 candidate tumour suppressor gene, exons 1 to 55 I 


602086769F1 NIH_MGCJB3 Homo sapiens cDNA clone IMAGE:4250969 5" | 


Antirrhinum majus transposon Tam3 pseudogene for transposase (In S-5 copy) j 


Antirrhinum majus transposon Tam3 pseudogene for transposase (In S-5 copy) j 


COMPLEMENT DECAY-ACCELERATING FACTOR PRECURSOR (CD55) | 


Mycobacterium tuberculosis H37Rv complete genome; segment 102/1 62 I 


CO 

s 

CD 
1 
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< 

5 

s 

u 

<c 
z 

Q 
o 
« 

s 

t 

o 

X 
CO 

z 

c 

s? 

-Q 

I 

« 
2 

s 

ta 

1 


CM2-BN0023-050200-087-f12 BN0023 Homo sapiens cDNA ! 


TRANSKETOLASE 2 (TK 2) (TRANSKETOLASE RELATED PROTEIN) j 


ac88a09.s1 Stratagene fetal retina 937202 Homo sapiens cDNA clone IMAGE:867736 3" \ 


Trimeresurus flavoviridis DNA for phocpholipaee A2 inhibitor, complete cds j 


Arabldopsls thallana DNA chromosome 4, contig fragment No. 38 I 


TRANSKETOLASE 2 (TK 2) (TRANSKETOLASE RELATED PROTEIN) J 


601453642F1 NIH_MGC_66 Homo sapiens cDNA clone IMAGE:3857243 5' j 


601 453642F1 N !H_MGC_66 Homo sapiens cDNA clone 1MAGE:3857243 5' j 


601453642F1 NIH_MGC_66 Homo sapiens cDNA clone IMAGE:3857243 5' | 


Top Hit 
Database 
Source 


E 


ESTJHUMAN | 




ESTJHUMAN | 


ESTJHUMAN | 


I 

h- 
(O 
LU 


H 

z 


EST_HUMAN | 


EST_HUMAN | 


z 


ESTJHUMAN | 


EST_HUMAN -| 


EST_HUMAN | 


ESTJHUMAN [ 


EST_HUMAN | 


t- 

z 


z 


ESTJHUMAN | 


& 


i- 

z 


SWISSPROT | 


\— 

z 


EST HUMAN I 


EST HUMAN ' 


SWISSPROT , 


EST JW MAN 
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z 




SWISSPROT 


EST HUMAN 


ESTJHUMAN 


EST HUMAN 


Top Hit Acession 
No. 


AF099189.1 


AW954476.1 j 


Z99119.1 i 


N22798.1 \ 


i 

Z 


i 

s 

< 


s 


CM 

1 

< 


AI080721.1 | 


Z32686.2 i 


n 

CD 

8 
1 


BE91 0039.1 i 


AU137084.1 ! 


AV687898.1 | 


BE894895.1 | 


AJ243211.1 I 


AJ243211.1 [ 


BF677270.1 I 


AB01 3985.1 | 


AB013986.1 j 


P08174 | 


Z79702.1 j 




AW992395.1 


P51854 | 


AA78072B.1 I 


AB003473.1 J 


s 

S 
5 

m 
< 


P51854 


BF035861.1 ! 


BF035B61.1 


BF035861.1 I 


Most Similar 
(Top) Hit 
BLAST E 
Value 


9.7E-02 


9.7E-02) 


9.7E-02| 


9.7E-02] 


9.7E-02| 


9 

UJ 


^ 

CD 


9.6E-02I 


9.6E-02| 


9.6E-02I 
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CD 
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9.6E-02I 
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9.6E-02 
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3 
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CO 
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CO 
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3.13, 


CD 
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m 
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d 
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CM 
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ORFSEQ 
ID NO: 


30696 


316821 


32958| 


33705) 


33706) 


34587 




27470 I 


27471 | 


296751 


302761 






35300 1 




35790 i 


35791 1 


35884) 


3591 5 1 


35916| 


36024| 


36572] 




29452! 


31289| 


32729 


§ 


33234I 
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33600 


33601 


36509 


Exon 
SEQID 
NO: 


18067 


18708! 


198B4| 


20577I 


20577| 


21440 


238191 


147441 


14744| 


17050] 


176681 


187951 
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CD 
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a 


221211 
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22668! 


22699! 
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22806! 


23334 | 


24954! 


16825 


183761 
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19888 


20141 


18376 


20476 


20475 


23273 
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NO: 
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| 71981 
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I 49401 


i 60141 
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Top Hit Descriptor 


za68a12.r1 Soares JetalJungJslbHL19W Homo sapiens cDNA clone IMAGE.297694 5' similar to 
PIR:S52171 S52171 small G protein - human ; 


7h63d03.x1 NCI_CGAP_Co16 Homo sapiens cDNA clone IMAGE:3320645 3' similar to contains AJu 
repetitive element; 


yl1 1 b08.$1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE:138903 3' [ 


Escherichia coli strain E2348/69 pathogenicity island, rOrfl (rorfl), rOrf2 (rorf2), EscR (escR), EscS (escS), 
EscT (escT), EscU (escU), CesD (cesD), EsoC (escC), EscJ (escJ), SepZ (sepZ), EscV (escV), EscN 
(escN), SepQ (sepQ), Tir (tir), OrfU (orfU), > 


6021 29030F2 NIH_MGC_56 Homo sapiens cDNA clone iMAGE:428595l 5* | 


6021 29030F2 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:4285951 5* I 


PM0-HT0339-251 1 99-003-d01 HT0339 Homo sapiens cDNA ] 


Atrichum angustatum AtranFIo2 protein (AtranFlo2) gene, partial cds | 
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UI-H-BI3-aio-f-08-0-Ul.s1 NCI_CGAP_Sub5 Homo sapiens cDNA clone IMAGE:3068294 3' | 


Homo sapiens similar to endoglycan (H. sapiens) (LOC63107), mRNA j 


FOLD BIFUNCTIONAL PROTEIN [INCLUDES: METHYLENETETRAHYDROFOLATE 
DEHYDROGENASE ; METHENYLTETRAHYDROFOLATE CYCLOHYDROLASE ] 


H. sapiens flow-sorted chromosome 6 HIndlll fragment. SC6pA20F8 ( 


NITRIC-OXIDE SYNTHASE, BRAIN (NOS, TYPE I) (NEURONAL NOS) (N-NOS) (NNOS) 
(CONSTITUTIVE NOS) (NC-NOS) (BNOS) 


6021291 1 1 F2 NIH„MGC_56 Homo sapiens cDNA clone !MAGE:4285827 5' | 


6021 291 11F2 NIH_MGC_56 Homo sapiens cDNA clone 1MAGE:4285827 5' | 


EST1 801 87 Liver, hepatocellular carcinoma Homo eapienc cDNA 5' end J 


qu55c05.x1 NCLCGAPJ_ym8 Homo sapiens cDNA clone IMAGE:1988680 3' similar to contains MER10.b1 
MER10 repetitive element ; 


qu55c05.x1 NCI_CGAPJ.ym6 Homo sapiens cDNA clone IMAGE:1968680 3' similar to contains MER10.b1 
MER10 repetitive element ; 


EST44454 Fetal brain I Homo sapiens cDNA 5' end j 


HYPOTHETICAL 51 .7 KD PROTEIN IN THRC-TALB INTERGENIC REGION (ORF8) j 


MYOSIN-2 ISOFORM | 


602129682F1 NIH_MGC_56 Homo sapiens cDNA clone IMAGE:42861 80 5' ! 


Human 4-hydroxyphenylpyruvate-dioxygenase gene, complete cds f 


PROBABLE DNA LIGASE (POLYDEOXYRIBONUCLEOTIDE SYNTHASE [ATP]) | 


EST1 1595 Uterus Homo sapiens cDNA 5' end ] 


Top Hit 
Database 
Source 
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X 
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TopHitAcession 
No. 


W56037.1 


BF062651.1 


R62805.1 ! 


AF022236.1 


BF701593.1 j 


BF701593.1 i 


BE1 53572.1 | 


AF286055.1 I 


AW452122.1 ! 


AW452122.1 ! 
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m 
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AI285627.1 




P30143 


P19524 


BF696918.1 | 


U29895.1 | 


Q27474 j 
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Most Simitar 
(Top) Hit 
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Expression 
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1.14 
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ORFSEQ 
ID NO: 


31840 




3261 9| 




26849| 
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27846 1 




31474! 
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31 496 | 


32848 




33768 


33858| 


33859j 


34331 [ 


35356 


35357| 


35477, 










26775| 


29272| 


Exon 
, SEQ ID 
NO: 


i 18891 


19381 


19585) 


24845 


CO 
CO 


14166| 


15107| 


16915| 


18552| 


18552| 


18567) 


19782 


20132| 


20644 


20725I 


20725 


21188 


22173 


221 73 I 


I 


25173? 


25207| 


24591 ! 


24716j 


14100| 


16633| 


Probe 
SEQ ID 

NO: 


5906 


6619 


| 6668 1 


12486 


I 1418| 


| 1418| 


2386 1 


1 4175| 


! 5760 | 


1 5760 | 


5776| 


7093 
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o 
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] 8030 | 


j 8030 | 


| 8496) 
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jcl Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2350221 3' similar to contains 
:MSR1 repetitive element ; 
















CO 

a 

o 
6c 

F 

1 


















i 
1 


contains Ll.tl L 




32 GAMMA 
















Top Hit Descriptor 


"36 Ovary li Homo sapiens cDNA 5' end 


orceilus glycoprotein alpha-subunit mRNA, complete cds 


orceilus glycoprotein alpha-subunit mRNA, complete cds 


apiens mRNA, similar to rat myomegalin, complete cds 


601190436F1 NlH_MGC_7 Homo sapiens cDNA clone IMAGE:3534393 5' 


apiens mRNA for FU00050 protein, partial cds 


1*0790-260400-1 62-d05 BT0790 Homo sapiens cDNA 


apiens attractin precursor (ATRN) gene, exon 2 


3.x1 Barstead colon HPLRB7 Homo sapiens cDNA clone IMAGE:2335842 3' sinr 
GOB-4. ; 


.r1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE:145B95 5' 


lexagonus mitochondrion, complete genome 


lexagonus mitochondrion, complete genome 


HETICAL LIPOPROTEIN MG309 HOMOLOG PRECURSOR 


3.x1 Soares_NhHMPu_S1 Homo sapiens cDNA clone IMAGE:21 25210 3' 


5.x1 Soares_NhHMPu_S1 Homo sapiens cDNA clone IMAGE:2125210 3' 


IjcI NCI_CGAPJ<id11 Homo sapiens cDNA clone IMAGE:2461 581 3' 


1 

si" 
c 

CD 

os 
co 
■<r 
c 

■g 

JZ 

"8 

o 

Q. 
W 
C 
<U 

% 


norvegicus dystrophin-related protein 2 A-form splice variant (Drp2) mRNA, comp 


og88g08.s1 NCI_CGAPJ<id5 Homo sapiens cDNA clone IMAGE:1455422 3' similar to . 
repetitive element ; 


).s1 NCIjDGAPJCidS Homo sapiens cDNA clone IMAGE:1 592779 3' 


.xl Human Pancreatic Islets Homo sapiens cDNA 3' similar to TR:Q15332 Q1 52 
I IT OF SODIUM POTASSIUM ATPASE LIKE. ; 


psls thatiana DNA chromosome 4, contig fragment No. 91 


telium discoideum DocA (doc A) mRNA, complete cds 
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o> 
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to 
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C 
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s 

f 

o 

1 

X 

CO 

to 
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X 

z 
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jallus mRNA for for OBCAM protein gamma Isoform 


amiliaris glutamate transporter (EAAT4) mRNA, complete cds 


apiens chromosome 21 segment HS21 C006 


psls thalfana DNA chromosome 4, contig fragment No. 1 0 
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1 


Homo s 
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JMAN | 




JMAN 


JMAN [ 
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NT 
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NT 


NT 


1 
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NT 


NT 
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Top HitAcessicn 
No. 


AA382934.1 ! 


AI827586.1 


AF257213.1 \ 


AF257213.1 I 


Hi 

s 

CN 

1 


BE267153.1 I 


AK024458.1 | 


BE096074.1 ! 


AF218890.1 


AI735184.1 


R79408.1 j 


5335680' 


58356801 


I 
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r- 

O) 
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co 

< 


AI942338.1 | 


AF052683.1 


| 

o> 

LL 
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CO 
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AA987873.1 


AW 583503.1 


AL161 595.2 | 


AF020409.1 | 


cd 
ta 
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1 
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CM 


CN 


CM 
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CM 


CM 


CM 


CM 


CM 
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CM 
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CO 


8.4E-I 


i 

to 
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9.84| 
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a 

Q 
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3.05 1 


S 

CM 


? 
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1.99| 
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ORF SEQ 
ID NO: 


< 
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1 
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30472| 
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33761 | 
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27463| 


27468| 
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CO 
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16553 
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14741| 
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22116 
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CO 
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125/536 



WO 01/57275 



PCT/US01/00667 



2 

CO 



ft 1 " 1 
8 

UJ 



a. 
8 



si 

t[ 

g 

11 



<X Q. 



a: o 
o w 

9 8 

f & 
"5 « J 
a. E i 

S?i 
II 

li 
t« 
I] 



f> S : 

to « , 

if 
§ |; 

0 © < 

1 TO C 



! § 
» m 

!l 
! I 

!8, 



{5 



'18 



fl 

I 



II 

o t 




O 



§ 9 



Isi 



126/536 



WO 01/57275 



PCT/US01/00667 



2 



Top Hit Descriptor 


vo 
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03 

s 

CO 

di 
O 
< 
2 

4> 
C 

o 
o 

Q 

to 
o 

& 

X 

o' 
o 

2 
I 

z 

ul 
cm 

o 

ID 


Homo sapiens SCL gene locus 


|RC1-HT0545-020800-017-d06 HT0545 Homo sapiens cDNA j 


j601654915R1 NIH_MGC_57 Homo sapiens cDNA clone IMAGE:3839810 3' j 


L.esculentum mRNA for triose phosphate translocator j 


L.esculentum mRNA for triose phosphate translocator [ 


| QV3-BN0046-1 50400-1 51 -e04 BN0046 Homo sapiens cDNA | 


Homo sapiens solute carrier family 6 (neurotransmitter transporter, glycine), member 9 (SLC6A9), mRNA 


Homo sapiens solute carrier family 6 (neurotransmitter transporter, glycine), member 9 (SLC6A9), mRNA 


CM 

8 
} 

c 
o 

£> 
c 
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i 

a> 

C 

o 

CO 

CO 

Jj 

01 

5 
f 
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|wq24h09.x1 NCI_CGAP_Kid1 1 Homo sapiens cDNA clone 1MAGE:2472257 3' j 


wl52b02Jd NCl_CGAP_Brn25 Homo sapiens cDNA clone IMAGE:2428491 3" similar to gb:M14328 ALPHA 
ENOLASE (HUMAN); ~ 


|AU1 16913 HEMBA1 Homo sapiens cDNA clone HEMBA1 000264 5' j 


7o61c05.x1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:3578504 3' similar to contains element 
MER27 repetitive element ; 


|601870205F1 NIH_MGC_19 Homo sapiens cDNA clone IMAGE:4100449 5' | 


| C.flml DSM 201 13 16S rDNA { 


|RC5-LT0054-260100-01 1-H09 LT0054 Homo sapiens cDNA [ 


|Equlne herpesvirus 4 strain NS80567, complete genome | 


| Mus musculus palred-IIke homeodomain transcription factor 1 (PItxl ), mRNA | 


|wf43h01 .xl Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2358385 3" | 


| Homo sapiens ADP/ATP carrier protein (ANT-2) gene, complete cds | 


[Rattus norvegicus Actlvin receptor like kinase 1 (Acvrll ), mRNA | 


| Mus musculus ubiquintin oterminal hydrolase related polypeptide (Uchrp), mRNA i 


|yg14g06.r1 Soares infant brain 1N1B Homo sapiens cDNA clone IMAGE:32339 5' I 
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| 
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| Human periodic tryptophan protein 2 (PWP 2) gene, exons 15 to 21, and complete cds | 


hh87d1 l.yl NCI_CGAP_GU1 Homo sapiens cDNA clone IMAGE:2967861 5' similar to SW:SCA2_HUMAN 
015127 SECRETORY CARRIER-ASSOCIATED MEMBRANE PROTEIN 2. ; 


Top Hit 
Database 
Source 
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Top Hit Descriptor 


601143974F1 NIH_MGC_15 Homo sapiens cDNA clone IMAGE:3051234 5" J 


COLLAGEN ALPHA 1(XVI) CHAIN PRECURSOR | 


CO 

c 

CD 

B 
5 
a 
^ 
E 
co 

5 


zl66f04.s1 Stratagene colon (#937204) Homo eapiene cDNA clone IMAGE:609599 3' | 


UI-H-BI1-acy-c-07-0-Ul.s1 NCI_CGAP_Sub3 Homo sapiens cDNA clone IMAGE:2716020 3' j 


al65a12.e1 SoaresJesttsJMHT Homo sapiens cDNA clone 1375678 3' similar to gb:K03002 60S 
RIBOSOMAL PROTEIN L32 (HUMAN); 


CMO-UM0001-060300-270-e12 UM0001 Homo sapiens cDNA | 


Canfs famillaris inducible nitric oxtde synthase mRNA, complete cds J 


601816291 F1 NIH_MGC_56 Homo sapiens cDNA clone 1MAGE:4050071 5' | 


Lumbricua rubellus mRNA for cyclophilin B j 


AV689285 GKC Homo sapiens cDNA clone GKCCAE06 5' | 


Gallus gallus mRNA for partial aczonin, XL spliced variant (acz gene) j 


African swine fever virus, complete genome j 
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Human myosin binding protdn H (MyBP-H) gene, complete cds J 


ah99a05.3l Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:1327184 3' similar to gb:L14837 
TIGHT JUNCTION PROTEIN ZO-1 (HUMAN); 


Homo sapiens chromosome 21 segment HS21C010 | 


Homo sapiens chromosome 21 segment HS21 C0 1 0 J 


Homo sapiens regulator of Gz-selective protein signaling (ZGAP1) mRNA, and translated products 


26S PROTEASOME REGULATORY SUBU NIT S3 (NUCLEAR ANTIGEN 21D7) J 


26S PROTEASOME REGUUTORYSUBUNIT S3 (NUCLEAR ANTIGEN 21D7) j 


Enterococcus faecium cysteine aminopeptidase (pepC) gene, partial cds; phospho-beta-glucosidase BgIB 
(bgIB), beta-glucoslde specific transport protein (bglS), transcription antitermlnator (bglR), enterocin B 
precursor (enS), enterocln B immunity prote> 
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601340661 F1 NIH _MGCJ53 Homo sapiens cDNA clone IMAGE:3683030 5' | 


Barbarie duck parvovirus REP protein (rep) and three capsid protein VP (vp) genes, complete cds j 


XIaevis XFD2 mRNA for fork head protein j 
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SEQ ID 
NO: 


24483 1 


13299| 


14233 | 


14498| 


15793 | 


16628 


16861| 


o 

i 


17604| 


18098| 


2 

s 


201 77| 


21 689 | 


22150| 


22502 | 


24035 


13285| 


13285| 


14058 


16525) 


16525| 


1 17831 


17845| 


201871 


21 143 | 


21143] 


21708| 


24580 1 


Probe 
SEQ ID 
NO: 


1 


lO 

Jo 


| 1486| 
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4119| 
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7300 1 


7506 1 
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PROTEIN TRANSPORT PROTEIN HOFC HOMOLOG | 


Homo sapiens membrane-bound amlnopepttdase P (XNPEP2) gene, complete cds [ 


ae30f02j-1 Gessler Wilms tumor Homo sapiens cDN A done IMAGE:897339 5' similar to gb:M22382 
MITOCHONDRIAL MATRIX PROTEIN P1 PRECURSOR (HUMAN); 


ae30f02.r1 Gessler Wilms tumor Homo, sapiens cDN A done 1MAGE:897339 5' similar to gb:M22382 ■ 
MITOCHONDRIAL MATRIX PROTEIN P1 PRECURSOR (HUMAN); 


Homo sapiens putative hepatic transcription factor (WBSCR14) gene, complete cds j 


co 

8 
8 

r- 

CO 

1 

< 
2 
Q 

o 

to 

§ 

■5. 

s 

1 

0 

X 
J— 

1 

i 

<n 

Hi 

CO 

s 

1 


ai75a06.s1 SoaresJestis.NHT Homo sapiens cDN A clone 1 376626 3' S 
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X 

s 


CELL-SURFACE RECEPTOR DAF-1 PRECURSOR j 


RC1-BT0254-O9030O-O1 7-d09 BT0254 Homo sapiens cDNA ! 


Homo sapiens chromosome 21 segment HS21C088 | 


Dictyostelium discoideum myosin heavy chain kinase A (MHCK A) mRNA, complete cds 
Pyrococcus abyssi complete genome; segment 5/6 


Pyrococcus abyssi complete genome; segment 5/6 

FB4A8 Fetal brain, Stratagene Homo sapiens cDNA clone FB4A8 3'end similar to LINE-1 


ah67f05.s1 Soares_testis_NHT Homo sapiens cDNA clone 1320705 3' | 


EST387948 MAGE resequences, MAGN Homo sapiens cDNA \ 


Mus musculus latent TGF beta binding protein (Tgfb), mRNA | 


Oncorhynchus mykiss TAP1 protein (OnmyTAPI) mRNA, OnrnyTAP1" t 01 allele, complete cds j 


qg79e04.x1 Soares_NFL_T_GBC_Sl Homo sapiens cDNA clone IMAGE:1841406 3' | 


HOMEOBOX PROTEIN HOX-D4 (CHOX-A) | 


H.saplens DNA for cGMP phosphodiesterase (exons 4-22) | 


H.saplens DNA for cGMP phosphodiesterase (exons 4-22) j 


xb61d 1 .xl SoaresJMFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2580788 3' 1 
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Drosophila mdanogaster cactln mRNA, complete cds j 


yi18b10.s1 Soares placenta Nb2HP Homo sapiens cDNA clone IMAGE.i39579 3' | 


Homo sapiens mesothelin (MSLN), transcript variant 1, mRNA | 


Homo sapiens mesothelin (MSLN), transcript variant 1, mRNA ) 


Top Hit 


Database 
Source 
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r> 
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1 

UJ 
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CD 
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AF1 95953.1 | 


AA496759.1 


AA498759.1 


AF1 56673.1 | 


AA781996.1 j 


AA781S96.1 j 


i 

CD 


BE1 41 076.1 \ 


P20792 | 
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2481 7| 


14613 


14613 
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Top Hit Descriptor 


Homo sapiens TESTIN 2 and TESTIN 3 genes, complete cds, alternatively spliced | 


INTER-ALPHA'TRYPS IN INHIBITOR HEAVY CHAIN H2 PRECURSOR (ITI HEAVY CHAIN H2) | 


INTER-ALPHA-TRYPSIN INHIBITOR HEAVY CHAIN H2 PRECURSOR (ITI HEAVY CHAIN H2) j 


P.vulgaris mRNA for chalcone synthase | 


MATERNAL EFFECT PROTEIN STAUFEN | 


MATERNAL EFFECT PROTEIN STAUFEN | 


Homo sapiens chemoWne receptor CXCR4 gene, promoter region and complete cds | 


Dlctyostelium discoideum darlln (darA) gene, complete cds | 


DNA POLYMERASE ZETA CATALYTIC SUBUNIT (HREV3) j 


Human respiratory syncyflal virus, complete genome \ 


Human respiratory syncytial vlru3, complete genome j 
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Homo sapiens chemoWne receptor CXCR4 gene, promoter region and complete cds | 


Mus musculus DIPB gene (Dlpb), mRNA j 


Rattus norvegicus cytochrome P450 2E1 (CYP2E1) gene, 5' flanking region | 


601671046F1 N IH_MGC_20 Homo sapiens cDNA clone IMAGE:39541 78 5' | 


Homo sapiens E2F-like protein (LOC51270), mRNA I 


Xenopus laevls alpha(E)-catenin mRNA, complete cds I 


Aquifex aedicue section 96 of 109 of the complete genome | 


zv46h12.s1 Soares ovary tumor NbHOT Homo sapiens cDNA clone IMAGE:756743 3' similar to gb:M26038 
HLA CLASS II HISTOCOMPATIBILITY ANTIGEN, DR-5 BETA CHAIN (HUMAN); 
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60165681 7R1 NIH_MGC_67 Homo sapiens cDNA clone 1MAGE:3865637 3' j 


60165681 7R1 NIH_MGC_67 Homo sapiens cDNA clone IMAGE:3865837 3* J 


601 82351 1F1 NIH_MGC_77 Homo sapiens cDNA clone IMAGE:4043138 5" | 
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Rabbit microsomal epoxide hydrolase j 


Nectria haerrwtocoeca kinesin related protein 2 (KRP2) gene, complete cds j 


A.carterae precursor of peridinin-chlorophylla-protein (PCP) gene | 


Mus musculus histone deacetylase 5 (Hdac5), mRNA \ 


Mus musculus histone deacetylase 5 (Hdac5), mRNA 
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Top Hit Descriptor 


Homo sapiens hypothetical protein SIRP-62 (SIRP-b-2), mRNA | 


Oryza satlva rbbiS-1 gene for putative Bowman Birk trypsin inhibitor 


RC5-BT0559-140200-012-CO3 BT0559 Homo sapiens cDNA | 


Hirudo medicinalis SNAP-25 homolog mRNA, complete cds j 


Bacillus subtilis complete genome (section 13 of 21): from 2395261 to 2913730 j 


Homo sapiens TESTIN 2 and TESTIN 3 genes, complete cds, alternatively spliced j 


AU120889 HEMBB1 Homo sapiens cDNA clone HEMBB1001630 5' | 


Neurospora crassa ublqulnol-cytochromec coddoreductase subunit VIII (QCR8) mRNA, complete cds 


RC6-FN0112-190700-021-D06 FN0112 Homo sapiens cDNA } 


RC6-FN01 12-1 90700-021 -D06 FN0112 Homo sapiens cDNA | 


QVO-ST0213-021 299-082-a09 ST021 3 Homo sapiens cDNA [ 
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ye37f12.M Stratagene lung (#937210) Homo sapiens cDNA clone IMAGE: 119951 5' simitar to gb:K01 506 
HLA CLASS II HISTOCOMPATIBILITY ANTIGEN, DP(1) ALPHA CHAIN (HUMAN); 
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Drosophila melanogaster laminin B2 gene, complete cds j 


Drosophlla melanogaster laminin B2 gene, complete cds | 


Pseudomonas putida ttgS gene J 


Mus musculus caudal type homeobox-1 (Cdx-1 ) gene, complete cds \ 


Helicobacter pylori 26695 section. 5 of 1 34 of the complete genome j 


Helicobacter pylori 26595 section 5 of 1 34 of the complete genome j 
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nuclear protein T1F1 isoform [mice, mRN A, 4053 nt] ~j 


HYPOTHETICAL 130.0 KD PROTEIN IN SNF6-SP011 INTERGENIC REGION | 


Mus musculus 129/Sv cystatin C (cst3) gene, complete cds ] 


Podospora anserina mitochondrial epsllon-sen DNA ] 


Homo sapiens hCMTIb mRNA for mRNA (guanine- 7-)methyt transferase, complete cds j 


Homo sapiens hCMTIb mRNA for mRNA (guanine-7-)methyltransferase, complete cds J 


D.rerio mRNA for zp-23 POU gene, splice variant (neurula, 9-18 hpf and postsomltogenesis, 20-28 hpf) 


B.rerio poujc] mRNA for transcription factor j 
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ID NO: 
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34809| 


36467| 


36530 


3707Q| 
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26450 1 


26451 | 


26929 


27961| 


26360 | 


<o 
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30670| 


30671 1 
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co 
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CD 


36897| 


to 

§ 
in 
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Exon 
SEQ ID 

NO: 


253491 


15785| 


17885| 


166411 


20719| 


21659] 


23234 | 


23292 


23600] 


23800 | 


13791] 


13791] 


14242 


CO 
CM 

to 


15709 | 


15709] 


15913| 


17749] 


18042| 


CM 
o 
co 


19529 


19944 


CD 

o 
CM 


s 

1 


21 713 | 


22680 | 


22680| 


22804 


22878| 


Probe 
SEQ ID 

NO: 


| 12797) 


CD 

O 
CO 


j 3416| 


| 3891| 


| 8024| 




| 10537] 


10598 


1 11132| 


| 11132] 


I 1031| 


co 
o 


1495 


1 


\ 2943 | 


] 2943 | 


| 3150] 


I 5029| 


I 5236' 


I 5236 


! 6785 


I 7260 


7777 


| 8304| 


j 9023j 


| 10032| 


| 10032] 


10156 


j 10230| 
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Top Hit Descriptor 
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a. 

1 
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□» 

CO 

X 

CO 

X 

i 

a 

CD 

c 
e 

a 

i 

c 

CO 

I 

1 

•c 
o 

C= 
co 

§ 

i 

CO 


Homo sapiens meprin A, alpha (PABA peptide hydrolase) (MEP1A) mRNA 


Homo sapiens partial LM01 gene for LtM domain only 1 protein, exon 1 


Homo sapiens partial LM01 gene for LIM domain only 1 protein, exon 1 


Arabidopsis thaliana putative dicarboxylate diiron protein (Crd1 ) mRNA, complete cds 


Mus musoulus cytokine inducible SH2-containing protein 3 (Cish3), mRNA 


Human steroid hormone receptor Ner-I mRNA, complete cds 


EST11352 Uterus Homo sapiens cDNA 5' end 


Saccharomyces cerevisiae Cdc54p (CDC54) gene, complete cds 


wJ80e04jc1 NCL.CGAPJ.ym12 Homo sapiens cDNA clone IMAGE;2409150 3' similar to cont 
MER15 repetitive element ; 


DNA POLYMERASE PROCESSIVITY FACTOR (POLYMERASE ACCESSORY PROTEIN) 
BINDING GENE 18 PROTEIN) 


Homo sapiens chromosome 21 segment HS21 C004 


Turnip mosaic virus genomic RNA for Capsid protein, complete cds 


OXALOACETATE DECARBOXYLASE ALPHA CHAIN 


DKFZp547D073_r1 547 (synonym: hfbri) Homo sapiens cDNA clone DKFZp547D073 5* 


Chlamydia trachomatis section 28 of 87 of the complete genome 


Homo sapiens chromosome 21 segment HS21 C046 


HIV-1 patient 96 from Italy protease (pol) gene, complete cds 


QVO-UM0051-250800-350-b08 UM0051 Homo sapiens cDNA 


Human hypoxanthine phosphortbosyltransferase (HPRT) gene, complete cds 


Human hypoxanthine phosphoribosyltransferase (HPRT) gene, complete cds 


Spodoptera littoralis mRNA for 3-dehydroecdysone 3beta-reductase 


KERATIN, TYPE 1 CYTOSKELETAL 14 (CYTOKERATIN 14) (K14) (CK 14) 


KERATIN, TYPE I CYTOSKELETAL 14 {CYTOKERATIN 14) (K14) (CK 14) 


Candida albicans protBin phosphatase Sed1 homolog (SSD1) gene, complete cds 


ANTER-SPECIFIC PROLINE-RICH PROTEIN APG (PROTEIN CEX) 


Homo sapiens ES18 mRNA, partial cds 


Homo sapiens ES18 mRNA, partial cds 


Campylobacter Jejuni NCTC11168 complete genome; segment 3/6 
Cucumis melo polygalacturonase precursor (MPG3) mRNA, complete cds 


as 8 o 
ISS 






















l- 






i- 


z 








z 








l- 


I- 




\~ 






















I 




i 


O 

or 






O 
oc 


< 
5 








i 








O 
OH 


O 
DC 




o 

CC 










NT 


NT 


NT 


NT 


NT 


NT 


IN I 


EST HI 




EST HL 


SWISSF 


NT 


NT 


SWISSF 


Z) 
X 

UI 


NT 


NT 


NT 


EST HL 


NT 


NT 


NT 


SWISSF 


SWISSF 


NT 


dSSIMSj 


NT 
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Top Hit Acession 
No. 


AF276815.1 


5031 908 1 


i 

CM 


AJ277661.1 j 


AF236101.1 | 


6671757| 


U07132.1 i 


AA297940.1 \ 


U14731.1 ; 


£ 

CD 
O 

s 


s 

CD 

£ 


AL1 63204.2 ! 


D10927.1 j 


Q03030 i 


AL1 34071.1 


AE001301.1 | 


AL1 63246.2 i 


AF280369.1 | 


BF378625.1 


M26434.1 | 


M26434.1 | 


AJ131968.1 


P02533 


P02533 


AF01 2898.1 


P40603 


d 

s 

3 
< 


AF083930.1 | 


AL1 39076.2 
AF062467.1 
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(Top) Hit 
BLAST E 
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CM 




CM 


CM 


CM 




CM C 








CM 


CM 




CM 




? 


? 




9 


S 


2 


s 


CM 


CM CM 


S 

CO 
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LU 

id 
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LU 

id 


9 

LU 
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id 


LU 
to 
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LU 

id 
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5.1 E-» 


LU 

id 


9 

LU 

id 
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5.1 E-I 


g 
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1.43 
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1 


2.34 


CM 


1.23 


1.19 


3.02 


CD 

d 


S 

d 


0.96 


3.13 


2.19 


1.87 


1.93 


1.17 


1.03 


49.38 


0.72 


1.44 


0.84 


0.84 


1.48 


0.58 


0.58 


<S 
to 


1.89 


2.44 


CM 


1.3 
2.56 
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ORFSEQ 
ID NO: 


30834 




2851 6 | 


285171 


29310J 




29609 | 




31548| 




32932 


i 
c 


35473 1 






I 




g 

§ 

CO 


30584 1 


33975| 


33976| 


34076| 


34622| 


34623 1 


35556j 


35946) 


36681| 


36662 1 
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o 

CO 


CO 

o 
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o 
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5 
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CO 


CO 
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s ? 
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O 

to 


TO 
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CO 




CM 
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Probe 
SEQ ID 

NO: 


12776 


1 2283 1 


| 3112| 


1 3112| 


CO 


! 3921 | 


! 4245| 


! 5053| 


| 5828 | 


6016 


7174 


9608 | 


I 9629| 


| 12414| 


| 2364| 


I 4179| 




| 6675 [ 


! 6760 


| 8151 1 


I 8151 j 


[ 8245| 


| 8783! 


| 8783 1 


| 9709| 


| 10082( 


| 10733 


[ 10733| 


11620 
12421 



139/536 



WO 01/57275 



PCT/USO 1/00667 



Top Hit Descriptor 
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i 
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I 

CD 

1 

<t 

z 
o 

to 

CO 

f 

o 
E 
o 
X 

ft 

CL 
< 
O 

o 

o 
z 

i 


Mus musculus fatty acid amide hydrolase gene, exon 1 0 | 


Bacillus subtilis complete genome (section 1 of 21 ): from 1 to 213080 I 


SALIVARY ACIDIC PROLINE-RICH PHOSPHOPROTEIN 1/2 PRECURSOR (PRP-1/PRP-3) (PRP-2/PRP- 
4) (PIF-F/PIF-S) (PROTEIN A/PROTEIN C) [CONTAINS: PEPTIDE P-C] 


a. 
E 

8 
< 

s 

3 
? 

co 

to 

§ 

? 

e 

3 
O 

.5 

cu 
a 

o 

o 
to 

13 
CO 

co 
o 

1 


Mus musculus Unc-51 like kinase 2 (C. elegans) (Ulk2), mRNA \ 


Antheraea pernyi period clock protein homolog mRNA, complete cds j 


CASEIN KINASE II BETA CHAIN (CK ll) ( 


Gallus gallus tyrosine kinase JAK1 (JAK1 ) mRNA, complete cds j 


Mus musculus Dmp-1 gene, exons 1-6 | 


NEUROFILAMENT TRIPLET L PROTEIN (NEUROFILAMENT LIGHT POLYPEPTIDE) (NF-L) \ 


Mus musculus Fas-interacting serine/threonine kinase 3 (Flst3) mRNA, complete cds ] 


801644753F1 NIH_MGC_55 Homo sapiens cDNA clone IMAGE:4070101 5' ] 


Methariococcus jannaschU section 1 42 of 1 50 of the complete genome j 


NO-ON-TRANSIENT A PROTEIN j 


Chicken 2840a vitamin D-dependent calcium-binding protein (CaBP-28) mRNA, complete cds f 


Homo sapiens ABCA1 (ABCA1) gene, complete cds [ 


Homo sapiens ABCA1 (ABCA1) gene, complete cds | 


ATROPHIN-1 {DENTATORUBRAL-PALLIDOLUYSIAN ATROPHY PROTEIN) | 


2q48a12.s1 Stratagene hNT neuron (#937233) Homo sapiens cDNA clone IMAGE:632926 3* similar to 
contains AJu repetitive elemsnt;contains element MSR1 repetitive element ; 
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CO 
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s 
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z 
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| 
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X 

z 
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I 
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zt78a03.s1 SoaresJestisJJHT Homo sapiens cDNA clone IMAGE :728428 3' j 
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8 
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O 

E 
0 
X 

1 
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0 
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Rat elastase 11 gene, exon 6 j 


Rat elastase II gene, exon 6 j 


Archaeoglobus fulgidus section 127 of 172 of the complete genome j 


Chlamydia muridarum, section 40 of 85 of the complete genome j 


ArabWopsis thaliana DNA chromosome 4, contlg fragment No. 59 | 


TRANSCRIPTION FACTOR E3 | 


Homo sapiens chromosome 21 segment HS21 C016 | 


Top Hit 
Database 
Source 


EST_HUMAN | 


NT | 


NT | 


SWISSPROT 


NT | 


NT 


NT ] 


SWISSPROT | 


NT I 
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< 
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i 

3 

X 

fe 
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ai 
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NT 
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AA400914.1 | 
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AW 167821.1 I 
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L00122.1 i 


L00122.1 I 


AE000980.1 1 


AE002309.1 


AL1 61 559.2 


P19532 


AL163218.2 
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BLAST E 
Value 


6.1E-02| 
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5.0E-02| 


5.0E-02 
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CO 
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ca 


4.9E-02| 


4.9E-02! 
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Homo sapiens solute carrier family 22 (organic cation transporter), member 1 (SLC22A1), mRNA | 


H.vulgare Ss1 gene for sucrose synthase ] 


Homo sapiens genomic region containing hypervarlable minisatellltes chromosome 10[10q26.3] of Homo 
sapiens 


C.glutamicum gap, pgk and tpl genes for glyceraldehyde-3-phosphate, phosphoglycerate kinase and 
trtosephosphate (somerase 
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CM2-EN0013-11 0500-1 92-b10 EN0013 Homo sapiens cDNA | 


ChromaHum vinosum sulfur globule protein Cv2 precursor (sgp2) gene, complete cds j 


nw20e05.s1 NCI CGAP.GCB0 Homo sapiens cDNA cloneMMAGE: 1241 024 3" similar to gb:J00314 rna2 
TUBULIN BETA-1 CHAIN (HUMAN); 


MRO-HT0158-030200-003-b08 HT0158 Homo sapiens cDNA f 


Dictyostelium dlscoideum unknown spore germination-specific protein-like protein, orfl, orf2 and orf3 genes, 
complete cds 


Dictyostelium discoldeum unknown spore flermination-specinc proteln-Iike protein, orfl, orf2 and orf3 genes, 
complete cds 


602020463F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:41 561 16 5' I 


601 82041 6F1 NIH _MGC_58 Homo sapiens cDNA clone IMAGE:4052570 5' | 


601 82041 6F1 NlH_MGC_58 Homo sapiens cDNA clone lMAGE:4O52570 5" j 


qk48b09.x1 NCLCGAP_Co8 Homo sapiens cDNA clone IMAGE: 1 872185 3' j 


Drosophila melanogaster tiggrin mRNA, complete cds f 


Homo sapiens microsomal epoxide hydrolase (EPHX1 ) gene, complete cds : 


602085136F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4249377 5' i 


6020851 36F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4249377 5' | 


Thermotoga mariBma section 85 of 138 of the complete genome j 


CYSTATHIONINE BETA-LYASE PRECURSOR (CBL) (BETA-CYSTATHIONASE) (CYSTEINE LYASE) 
Maize actin 1 gene (MAc1 ), complete cds 
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(Top) Hit 
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Value 
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26991 | 
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22165| 
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S 

CN 


13644| 
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CO 
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j 9914 1 
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8 
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Top Hit Descriptor 


S.griseocameum whlG-Stv gene j 


Ratfpolyomavinjs left junction in cell line W98.14 j 


yd33h12.s1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:1 10087 3' similar to contains 
Alu repetitive element;contains LTR1 repetitive element ; 


Saguinus oedipus tissue kallikrein gene, complete cds ] 


Homo sapiens cytochrome P450, subfamily IIB (phenobarbital-induclble) (CYP2B), mRNA ] 


Mus musculus kinesin family member 3c (Kif3c), mRNA j 


CO 

S 

o 

1 
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< 
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! i 

13 
is 

8 9 
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il 


qm17b04.x1 NCI_CGAP_Lu5 Homo eapiens cDNA clone IMAGE:1882063 3' | 


zg54b12.s1 Soares_pIneal_gland_N3HPG Homo sapiens cDNA clone IMAGE.397151 3' similar to 
gb:L08441 CYTOCHROME C OXIDASE POLYPEPTIDE 111 (HUMAN); 


Macaca mulatta chemokine receptor CCR5 mRNA, complete cds | 


Homo sapiens dual specificity phosphatase 4 (DUSP4) mRNA \ 


NEURONAL ACETYLCHOLINE RECEPTOR PROTEIN, ALPHA-3 CHAIN PRECURSOR (GF-ALPHA-3) | 


Mus musculus adaptor-related protein complex AP-3, delta subunit (Ap3d), mRNA j 


Drosophila melanoga3ter mRNA for headcase protein | 


Human leukemia inhibitory factor receptor (LIFR) gene, promoter and partial exon 1 j 


zs81a06.r1 NCI_CGAP_GCB1 Homo sapiens cDNA done 1MAGE:703858 5* j 


602066783F1 NIH_MGC_57 Homo 3apiens cDNA clone (MAGE:4065789 5' | 


Neisseria meningitidis DNA for region 2 (fhaB- and fhaC-homologs, unknown genes) and flanking genes, 
strain FAM18 


601 658879R1 NIH_MGC_69 Homo eapiens cDNA clono IMAGE:3886291 3' | 


Enteroccccus faecalis surface protein precursor, gene, complete cds 1 


Mus musculus Nstidine rich calcium binding protein (Hrc), mRNA | 


Pityokteines minutus cytochrome oxidase i gene, partial cds; mitochondrial gene for mitochondrial product 
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Pseudomonas fluorescens family II aminotransferase gene, complete cds j 
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EST74530 Pineal gland II Homo sapiens cDNA 5' end | 


Homo sapiens neuropilin 2 (NRP2) gene, complete cds, alternatively spliced | 


Homo sapiens neuropilin 2 (NRP2) gene, complete cds, alternatively spliced | 


Homo sapiens mRNA for KIAA1 573 protein, partial cds j 


Top Hit 
Database 
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WO 01/57275 
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Top Hit Descriptor 


za39a,10.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:29490Q 5' similar to contains 
element TAR1 repetitive element ; 


za39a10.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:294906 5' similar to contains 
element TAR 1 repetitive element; 


| Cyprinus carpio mRNA for inducible niiric oxide synthase (INOS gene) j 
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|601512206F1 N!H_MGC_71 Homo sapiens cDNA clone IMAGE:3913848 5' | 


Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKB1) gene, complete 
cds 


Homo sapiens nuclear factor of kappa light polypeptide gene enhancer in B-cells 1 (NFKB1) gene, complete 
cds 


CO 

§ 

c» 

1 

c 

1 

X 


|601854981F1 NIH_MGC_57 Homo sapiens cDNA clone IMAGE:4074548 5' j 


|6021 54364F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4295654 5' | 


] IL5-HT0704-290600-108-C04 HT0704 Homo sapiens cDNA j 


[Omithorhynchus anatinus coagulation factor X mRNA, complete cds j 


[Thermotoga marltima section 1 09 of 136 of the complete genome \ 


| HSAAADTHS TEST1 , Human adult Testis tissue Homo sapiens cDNA clone cam test244 (b) [ 


[Human coagulation factor VII (F7) gene exon 1 and factor X(F10) gene, exon 1 | 


[ne87f04.s1 NC!_CGAP_Kid1 Homo sapiens cDNA done tMAGE:91 1263 | 
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|QV4-NN0038.270400-1 87-h05 NN0038 Homo sapiens cDN A | 
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Homo sapiens mitochondrial glutathione reductase and cytosolic glutathione reductase (GRD1) gene, 
complete cds, alternatively spliced 


|601338428F1 NIH_MGC_53 Homo sapiens cDNA clone IMAGE:3680695 5' 
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Iyu07e10.r1 Soares fetal liver spleen 1 NFLS Homo sapiens cDNA clone IMAGE:2331 30 5' 


|e01452661 F1 NIH_MGC_66 Homo sapiens cDNA clone IMAGE:3856598 5' j 


Neisseria meningitidis DNA for region 2 (fhaB- and fhaC-homologs, unknown genes) and flanking genes, 
strain FAM18 


[601 1 40729F1 N IH_MGC_9 Homo sapiens cDNA clone IMAGE:3049830 5' j 
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Top Hft Descriptor 


J601680305R2 NIH_MGC_83 Homo sapiens cDNA clone 1MAGE:3950665 3' j 


Rattus norveglcus rabphiiin-3A mRNA, complete cds f 


IH.carterae mRNA for fucoxarthin chlorophyll elc binding protein, Fcp1 | 


.H.carterae mRNA for fucoxanthin chlorophyll a/o binding protein, Fcp1 ( 


!PM2-NN0128-O80700-001-a12 NN0128 Homo sapiens cDNA j 
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|zx83c10.x5 Scares ovary tumor NbHOT Homo sapiens cDNA clone IMAGE:810354 3' ! 


7e30e09.x1 NCLCGAP_Lu24 Homo sapiens cDNA clone IMAGE:3284008 3' similar to contains L1.M L1 
repetitive element ; 


to 

o 
CO 

£J 

O) 
CO 

lii 
O 
< 
5 

1 

o 
tn 
c 
t> 

CL 

ta 

o 

i 

o 

Ol 

O 
O 

X 
z 
Ll 

8 
8 

to 

i 


| Chlamydomonas relnhardtu VSP-3 mRNA, complete cds j 
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|602070562F1 NCI_CGAP_Brn64 Homo sapiens cDNA done IMAGE:4213406 5* | 


|CH0RDIN PRECURSOR (ORGANIZER-SPECIFIC SECRETED DORSALIZING FACTOR) J 
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D.radicum 28S ribosomal RNA, D2 domain | 
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[HYPOTHETICAL 46.7 KD PROTEIN C19G10.05 IN CHROMOSOME I ] 


[HYPOTHETICAL 46.7 KD PROTEIN C19G10.05 IN CHROMOSOME I ! 


Bos taurus partial stat5B gene, exons 17-19 | 


Mus musculus major histocompatibility locus class II region: major histocompatibility protein class II afpha 
chain (lAalpha) and major histocompatibility protein class II beta chain (lEbeta) genes, complete cds; 
butyrophilirvlike (NG9), butyrophitin-li> 


| Homo sapiens gene for LECT2, complete cds ! 
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|Homo sapiens mitogen-activated protein kinase kinase kinase 13 (MAP3K13), mRNA I 


1601652365R2 NIH_MGC_B2 Homo sapiens cDNA clone 1MAGE:3935513 3* j 


|yr75f1 1.M Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:21 1 149 5' j 


|H-2 CLASS 1 HISTOCOMPATIBILITY ANTIGEN, K-B ALPHA CHAIN PRECURSOR (H-2K(B)) | 


[H-2 CLASS I HISTOCOMPATIBILITY ANTIGEN, K-B ALPHA CHAIN PRECURSOR (H-2K(B)) | 


|T.fhermophila calcium-binding 25 kDa (TCBP 25) protein mRNA, complete cds | 
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DIctyostelium discoideum extracellular signal-regulated protein kinase (ERK1 ) mRNA, complete cds 
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jArabidopsIs thaliana DNA chromosome 4, contig fragment No. 32 | 
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jLasaea sp. isolate IBd cytochrome oxidase III gene, partial cds; mitochondrial gene for mitochondrial product 


IL3-CT0219-160200.063-C07 CT0219 Homo sapiens cDNA f 


Homo sapiens chromosome 21 segment HS21 C1 01 J 


Mus musculus major histocompatibility complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, 
KIFC1, Fas-binding protein, BING1, tapasin, RalGDS-like, KE2, BING4, beta 1 ,3-galactosyi transferase, and 
RPS18 genes, complete cds; Sacm21 gene, partial> 
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Homo sapiens chromosome 21 segment HS21 C1 03 | 
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Arabidopsis thaiiana DNA chromosome 4, contig fragment No. 82 j 


Arabidopsis thaiiana DNA chromosome 4, contig fragment No. 82 

Mus musculus histocompatibility 2, complement component factor B (H2-Bf), mRNA 
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Homo sapiens calcium channel alphal E subunit (CACNA1 E) gene, exons 7-49, and partial cds, alternatively 
spliced 


INTEGRIN BETA-7 PRECURSOR (INTEGRIN BETA-P) (M290 IEL ANTIGEN) j 
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AU140261 PLACE2 Homo sapiens cDNA clone PLACE2000223 5' j 


Mus musculus major histocompatibility complex region NG27, NG28, RPS28, NADH oxidoreductase, NG29, 
KIFC1 , Fas-blndfng protein, BING1, tapasin, RalGDS-IIke, KE2, BING4, beta 1,3-galactosyf transferase, and 
RPS18 genes, complete cds; Sacm21 gene, partial> 
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Top Hit Descriptor 


PROBABLE UBIQUITIN CARBOXYL-TERMINAL HYDROLASE FAF-Y (UBIQUITIN THIOLESTERASE 
FAF-Y) (UBIQUITIN-SPECIFIC PROCESSING PROTEASE FAF-Y) (DEUBIQUITINATING ENZYME FAF- 
Y) (FAT FACETS PROTEIN RELATED, Y-LINKED) (UBIQUITIN-SPECIFIC PROTEASE 9, Y 
CHROMOSOME) 


Chlamydophila pneumoniae AR39, section 62 of 94 of the complete genome j 


Mus musculus AMD1 gene for S-adenosyimethionine decarboxylase, complete cds | 


Tursiops truncatus mRNA for p40-phox. complete cd s | 


EST03012 Fetal brain, Stratagene (cat#936206) Homo sapiens cDNA clone HFBCR93 simitar to EST 
containing AIu repeat 


RC3-CT0255-031 099-01 1 -f07 CT0255 Homo sapiens cDN A | 


Homo sapiens MASL1 mRNA, complete cds | 


RC6-CT0281 -081 199-01 1-A05 CT0281 Homo sapiens cDNA [ 


RC6-CT0281-081 199-01 1-A05 CT0281 Homo sapiens cDNA [ 


BETA-GALACTOSIDASE PRECURSOR (LACTASE) [ 


Mouse complement receptor (CR2) mRNA, 3' end | 


Escherichia coll genomic DNA. (19.1 - 19.4 min) | 


Rabbit uteroglobin (UGL) gene, exon 1 | 


SOF1 PROTEIN | 


Plasmodium berghei 58 kDa phosphoprotein mRNA, partial cds | 
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Homo sapiens PRO0471 protein (PRO0471 ), mRNA | 


ag49e10.s1 Gessler Wilms tumor Homo sapiens cDNA clone IMAGE:1 126290 3' J 


694F Heart Homo sapiens cDNA clone 694 ] 


xn59g05.x1 Soares_NHCeC_cerw'ca)_tumor Homo sapiens cDNA clone IMAGE:2698040 3* similar to 
contains L1 , t2 L1 repetitive element ; 


xn59g05jrt Soares_NHCeC_cervical_tumor Homo sapiens cDNA clone 1MAGE:2698040 3' similar to 
contains L1 .t2 L1 repetitive element ; 
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Mus musculus genomic fragment, 279 Kb, chromosome 7 
Mus musculus genomic fragment, 279 Kb, chromosome 7 
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7q74c09.x1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE: 3' similar to 
eIement;contains element MER31 repetitive etement ; 
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Homo sapiens protein kinase CK2 catalytic subunit alpha gene, exon 1 


Homo sapiens protein kinase CK2 catalytic subunit alpha gene, exon 1 
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Mus musculus Intestinal trefoil factor gene, partial cds 


Mus musculus Intestinal trefoil factor gene, partial cds 
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Mus musculus mRNA for hypothetical protein (ORF2 orthotog) 
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MYOSIN HEAVY CHAIN, SMOOTH MUSCLE ISOFORM (SMMHC) j 


AV654352 GLC Homo sapiens cDNA clone GLCDUH1 0 3* | 


tq03b1 1.x1 NCI_CGAP_Ut3 Homo sapiens cDNA clone 1MAGE:2207709 3" | 


EST1 1 191 Uterus Homo sapiens cDNA 5' end similar to EST containing O family repeat j 


Mus musculus G protein coupled receptor gene, complete cds; and unknown gene j 


AU121712 MAMMA 1 Homo sapiens cDNA clone MAMMA1000798 5' I 


QV0-CT0387-1 80300-167-610 CT0387 Homo sapiens cDNA | 


LINE-1 REVERSE TRANSCRIPTASE HOMOLOG | 


MYOMESIN 2 (M-PROTEIN) (165 KD TITIN-ASSOCIATED PROTEIN) (165 KD CONNECTIN- 
ASSOCIATED PROTEIN) 


Solanum lycopereicum phytochrome F (PHYF) gene, partial cds f 


Solanum lycoperslcum phytochrome F (PHYF) gene, partial cds | 


Homo sapiens DNA, DLEC1 to ORCTL4 gene region, section 1/2 (DLEC1, ORCTL3, ORCTL4 genes, 
complete cds) 


Homo sapiens DNA, DLEC1 to ORCTL4 gene region, section 1/2 (DLEC1, ORCTL3, ORCTL4 genes, 
complete cds) 


Homo sapiens FRA3B common fragile region, diadenosine triphosphate hydrolase (FHIT) gene, exon 5 


Human immunoglobulin C(mu) and C(delta) heavy chain genes (constant regions) j 


ai22a1 Zs1 Soares Jestis_NHT Homo sapiens cDN A clone 1 343518 3* \ 


GASTRULA ZINC FINGER PROTEIN XLCGF26.1 J 
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Top Hit Descriptor 


ab90f10.s1 Stratagene lung (#937210) Homo sapiens cDNA clone IMAGE:854251 3' similar to contains 
MER20.t1 MER20 repetitive element ; 


Homo sapiens KIAA0555 gene product (KIAA0555), mRNA ] 


qw16g09.x1 NCI_CGAP_Ut3 Homo sapiens cDNA clone IMAQE:1991299 3' similar to contains Alu repetitive 
element; 


EST99205 Thyroid Homo sapiens cDNA 5' end similar to EST containing L1 repeat j 


jQV2-OT0062-250400-173-h01 OT0062 Homo sapiens cDNA ( 
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Homo sapiens DNA segment, numerous copies, expressed probes (GS1 gene) (DXF6BS1E), mRNA 


ICERULOPLASMIN PRECURSOR (FERROXIDASE) | 


|601881522F1 NIH _MGC_57 Homo sapiens cDNA clone IMAGE:4093972 5' | 


| QV3-BT0379-01 0300-1 05-d11 BT0379 Homo sapiens cDNA j 


| QV3-BT0379-01 0300-1 05-d11 BT0379 Homo sapiens cDNA j 


[OVARIAN ABUNDANT MESSAGE PROTEIN (OAM PROTEIN) j 


ox08e02.x1 Soares_fetal_liver_spleen_lNFLS_S1 Homo sapiens cDNA clone IMAGE:1 655738 3' similar to 
contains MER8.t2 MER8 repetitive element ; 


| Mus musculus E-cadherin binding protein E7 mRNA, complete cds j 


| PROTEIN XE7 i 


|IL5-UM0070-110400-063-g02 UM0070 Homo sapiens cDNA i 


Homo sapiens calcium channel, voltage-dependent, alpha 1 i subunit (CACNA1 1), mRNA j 


Homo sapiens chromosome 21 segment HS21C048 ] 


Human ABL gene, exon 1b and Intron 1b, and putative M8604 Met protein (M8604 Met) gene, complete cds 


Homo sapiens gene for LECT2, complete cds J 


|RC1-CT0302-120200-013-h02 CT0302 Homo sapiens cDNA j 


|RC1-CT0302-120200-013-h02 CT0302 Homo sapiens cDNA j 


|EST1 85496 Colon carcinoma (HCC) cell line Homo sapiens cDNA 5' end j 


| COMPLEMENT C2 PRECURSOR (C3/C5 CONVERTASE) | 


|HA0877 Human fetal liver cDNA library Homo sapiens cDNA j 


ya48c03,r1 Scares Infant brain 1NIB Homo sapiens cDNA clone IMAGE:53254 5* similar lo contains Alu 
repetitive element; contains L1 repetitive element ; 


xc69g12.x1 NCI_CGAP_Eso2 Homo sapiens cDNA clone 1MAGE:2589574 3' simBar to contains Alu 
repetitive etemen t;con tains element MER2 1 repetitive element ; 
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CM2-TNO140-07O900-372-g01 TN01 40 Homo sapiens cDNA j 


Homo sapiens mRNA for KIAA0027 protein, partial cda | 
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Homo sapiens PHD finger protein 2 (PHF2) mRNA J 
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nz88f1 1 .s1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:1302573 3' similar to contains Alu 
repetitive element; 


Homo sapiens FRA3B common fragile region, diadenosine triphosphate hydrolase (FHIT) gene, exon 5 
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xp45fl2.x1 NC!_CGAP_HN11 Homo sapiens cDNA clone IMAGE:2743343 3" similar to contains Alu 
repetitive element;contalns element MER9 repetitive element ; 


Homo sapiens a disintegrin and metalloproteinase domain 29 (ADAM29), mRNA ] 


te91c12.x1 NCI_CGAP Pr28 Homo sapiens cDNA clone IMAGE:209407G 3' similar to TR:O00519 000519 
FATTY ACID AMIDE HYDROLASE. ; 
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xp45f12.x1 NCLCGAP_HN1 1 Homo sapiens cDNA clone IMAGE:2743343 3' similar to contains Alu 
repetitive e!ement;conta!ns element MER9 repetitive element ; 


Homo sapiens Xq pseudoautosomai region; segment 2/2 


Homo sapiens Xq pseudoautosomai region; segment 2/2 | 
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Homo sapiens rhabdoid tumor deletion region protein 1 (RTDR1 ), mRNA I 


hv90g10.x1 NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE:3180738 3* similar to contains Alu 
repetitive eiemenfccqntains 0FR.t1 OFR repetitive element ; 


LINE-1 REVERSE TRANSCRIPTASE HOMOLOG | 
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element; 
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Top Hit Descriptor 


Mus musculus keratin-associated protein 9-1 (Krtap9-1), mRNA j 


ze34c09.M Soares retina N2b4HR Homo sapiens cDNA clone IMAGE:360880 5' | 


OLFACTORY RECEPTOR-LIKE PROTEIN OLF2 | 


RC3-BT0333-250800-1 1 4-f04 BT0333 Homo sapiens cONA | 


RC3-BT0333-250800-1 14-f04 BT0333 Homo sapiens cONA j 


6013041 25F1 NIH_MGC_21 Homo sapiens cDNA clone IMAGE:3638310 5' | 


yo79g07.r1 Soares adult brain N2b4HB55Y Homo sapiens cDNA clone IMAGE:184188 5' similar to contains 
MER10 repetitive element ; 


Human gene for Ah-receptor, exon 7-9 j 


Homo sapiens protein tyrosine phosphatase, non-receptor type substrate 1 (PTPNS1 ) mRNA j 


aJ49b12.s1 Soares Jestls_NHT Homo sapiens cDNA clone IMAGE:1393631 3" similar to contains MER37.t2 
MER37 repetitive element ; 


Oryctolagus cuniculus sodium/dicarboxylate cotransporter mRNA, partial cds | 


nh22d03.8l NCI_CGAP_Pr1 Homo sapiens cDNA clone IMAGE:953093 similar to contains Ll.tl L1 
repetitive element ; 


Oryctolagus cuniculus Na+/glucose cotransporter-related protein mRNA, complete cds j 


Oryctolagus cuniculus Na+/glucose cotransporter-related protein mRNA, complete cds j 


Homo sapiens pituitary tumor transforming gene protein (PTTG) gene, complete cds f 


Rabbit phosphorylase kinase beta subunit mRNA, complete cds j 


ye72b02.r1 Soares fetal liver spleen 1 NFLS Homo sapiens cDNA clone IMAGE:123243 5' similar to contains 
OFR repetitive element ; 


Human dystrophin (DMD) gene, exons 7, 8 and 9, and partial cds j 


RCO-ST0174-191099-031-D05 ST0174 Homo sapiens cDNA j 


yy31e09.r1 Soares melanocyte 2NbHM Homo sapiens cDNA clone IMAGE:272872 5' | 


Homo sapiens MAGE-B2 (MAGE-B2), MAGE-B3 (MAGE-B3), MAGE-B4 (MAGE-B4), and MAGE-B1 
(MAGE-B1) genes, complete cds 
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PM4-AN0096-050900-003-a04 AN0098 Homo sapiens cDNA j 


DKFZp547D092_r1 547 (synonym: hfbrl) Homo sapiens cDNA clone DKFZp547D092 5' ) 


nI48c04.s1 NCI_CGAP_Pr4 Homo sapiens cDNA clone IMAGE:1043718 similar to contains MER29.b2 
MER29 repetitive element ; 
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ht09g01 xi NCI_CGAPJ<id13 Homo sapiens cDNA clone IMAGE:3146256 3' similar to contains MER29.b3 
MER29 repetitive element ; 
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Top Hit Descriptor 


AV661044 GLC Homo sapiens cDNA clone GLCGOA10 3' 


601844465F1 NIH_MGC_54 Homo sapiens cDNA clone IMAGE:4064945 5* 


RC1-OT0083-100800-019-g08 OT0083 Homo sapiens cDNA 
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Homo sapiens mRNA for KIAA0397 protein, partial cds 


Homo sapiens mRNA for K1AA0397 protein, partial cds 


RC4-BT03 11 -141 199-01 1-H06 BT0311 Homo sapiens cDNA 


ZONADHESIN PRECURSOR 


ZONADHESIN PRECURSOR 


ts30f03od NCI_CGAP_Pan1 Homo sapiens cDNA done IMAGE:2230109 3* similai 
HYPOTHETICAL 51.1 KD PROTEIN ; 
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Homo sapiens Hyperion gene, exons 1-50 


QVO-HT0103-091199-050«g11 HT0103 Homo sapiens cDNA 


AU 138779 PLACE1 Homo sapiens cDNA clone PLACB1005052 5* 


601680636F1 NlH_MGC_83 Homo sapiens cDNA clone IMAGE:3951008 5' 


60168O636F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:3951008 5' 


Homo sapiens putative 8-hydroxyguanine DNA glycosyiase gene, complete cds 


nl46c04.s1 NCI_CGAP_Pr4 Homo sapiens cDNA clone IMAGE;1043718 similar to 
MER29 repetitive element ; 


ar88d12.x1 Barstead colon HPLRB7 Homo sapiens cDNA clone IMAGED 52343 3' 


DKFZp434l0830_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434lO 


qg47e05.x1 Soares_testis_NHT Homo sapiens cDNA clone 1MAGE:1 838336 3' sim 
PROTEIN (HUMAN); ■ 


Homo sapiens SET domain and mariner transposase fusion gene (SETMAR) mRN/ 


tz94a03j<1 NCI_CGAP_Kld1 1 Homo sapiens cDNA clone iMAGE:2296204 3' simllf 
NEUTRAL PROTEASE LARGE SUBUNIT ; 


Homo sapiens chromosome 21 segment HS21C001 


Homo sapiens chromosome 21 segment HS21 CO 01 
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«. Top Hit Descriptor 


as38h08.x1 Barstead aorta HPLRB6 Homo sapiens cDNA clone IMAGE2319519 3' similar to 
WP:F49C12.11 CE03371 ; 


|EST33446 Embryo, 12 week II Homo sapiens cDNA 5' end j 


| Homo sapiens upstream binding transcription factor, RNA polymerase I (UBTF), mRNA | 


[601 191345F1 NIH_MGC_7 Homo sapiens cDNA clone IMAGE:3535210 5' 


(Human DNA, SINE repetitive element j 


|DKFZp434l066ji 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434l066 5* I 


zn30d08.r1 Stratagene neuroepitheiium NT2RAMI 937234 Homo sapiens cDNA clone IMAGE:548943 5' 
similar to gb:M1 4338 VITAMIN K-DEPENDENT PROTEIN S PRECURSOR (HUMAN); 


zD30fio.rl Stratagene colon (#937204) Homo sapiens cDNA clone IMAGE:588427 5' similar to TR:G6S5374 
G695374 THYROID RECEPTOR INTERACTOR ; 


zo30f10.rl Stratagene colon (#937204) Homo sapiens cDNA done IMAGE:588427 5* similar to TR:G695374 
G695374 THYROID RECEPTOR INTERACTOR ; 


|601864963F1 NIH _MGC_57 Homo sapiens cDNA clone IMAGE:4083278 5' | 


|Homo sapiens MLL(MLL) gene, exons 1-3, and partiaj cds | 


jQV2-PT0012-040400-124-e05 PT0012 Homo sapiens cDNA j 


|QV2-PT001 2-040400-1 24-e05 PT001 2 Homo sapiens cDNA j 


nn37d05,s1 NCI_CGAP_GC5 Homo sapiens cDNA clone IMAGE:1086057 3' similar to contains OFR.tl 
OFR repetitive element ; 


|Mus musculus harmonln isoform b3 (Ushlc) mRNA, complete cds, alternatively spliced ! 
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MER30 repetitive element ; 
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|Homo sapiens myotubularln related protein 7 mRNA, partial cds | 
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Repetitive element;contains dement MER20 MER20 repetitive element ; 
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Top Hit Descriptor 


Homo sapiens envelope protein RIC-6 (env) gene, complete cds | 


wr65d10.x1 NCI_CGAP_Ut1 Homo sapiens cDNA clone I MAGE: 2492563 3' similar to TR:0 15546 015546 
HERV-E ENVELOPE GLYCOPROTEIN ; 


wr65d1Cx1 NCI CGAPJJtl Homo sapiens cDNA clone IMAGE:2492563 3' similar to TR:0 15546 015546 
HERV-E ENVELOPE GLYCOPROTEIN ; 


Homo sapiens chromosome 21 segment HS21 C068 j 


O371e04o1 NCI„CGAP_GC2 Homo sapiens cDNA clone IMAGE:1610814 3" similar to contains L1 .t2 L1 
repetitive element ; 


Wf27g07.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2356860 3' similar to contains 
element MER6 repetitive element ; 


wf27g07.x1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE;2356860 3' similar to contains 
element MER6 repetitive element ; 


601 442206F1 NIH_MGC_65 Homo sapiens cDNA clone IMAGE:3846648 5' j 


Homo sapiens DNA-binding protein (LOC56242), mRNA j 


Homo sapiens DNA-binding protein (LOC56242), mRNA | 


Homo sapiens chromosome 21 segment HS21 C048 | 


Homo sapiens chromosome 21 segment HS21 C048 I 


Homo sapiens chromosome 21 segment HS21 C048 j 


Homo sapiens chromosome 21 segment HS21 CO 48 j 


601669934F1 NIH_MGC_20 Homo sapiens cDNA clone IMAGE:3952833 5' | 


Homo sapiens splicing factor similar to dnaJ (SPF31 ), mRNA i 


QV0-OT0032-080300.155-d01 OT0032 Homo sapiens cDN A I 
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R.rattus RYA3 mRNA for a potential ligand-binding protein j 


nz20c07.s1 NC1_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:1 288332 3* similar to contains MER4.b1 
MER4 repetitive element ; 


Homo sapiens zinc/Iron regulated transporter-like (ZIRTL), mRNA j 


HSC23F051 normalized Infant brain cDNA Homo sapiens cDNA clone c-23f05 j 


EST9731 7 Thymus I Homo sapiens cDNA 5' end similar to EST containing O family repeat j 
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Human mRNA for integrin alpha subunit, complete cds | 


QVO-BN0147-290400-214-f12 BN0147 Homo sapiens cDNA | 


Homo sapiens CTCL tumor antigen se20-10 mRNA, partial cds | 
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Top Hit Descriptor 


Human lambda-immunoglobulln constant region complex (germline) 


tg92g03.x1 NCI_CGAP_CLL1 Homo sapiens cDNA clone IMAGE:21 16276 3* similar to contains Alu 
repetitive element; 


| Human aconitate hydratase (AC02) gene, exon 7 J 


|Homo sapiens chromosome 21 segment HS21 C078 j 


|Homo sapiens chromosome 21 segment HS21 C01 0 j 


|Homo sapiens chromosome 21 segment HS21 C010 j 


|QV3-DT0043-090200-080-c06 DT0043 Homo sapiens cDNA j 


|QV3-DT0043-090200-080-c08 DT0043 Homo sapiens cDNA 


RETROVIRUS-RELATED POL POLYPROTEIN [CONTAINS: REVERSE TRANSCRIPTASE ; 
ENDONUCLEASE] 


jCM1-ST0181-091199-035-f08 ST0181 Homo sapiens cDNA j 


qq93c05.x1 Soares_total_fatus_Nb2HF8j9w Homo sapiens cDNA clone IMAGE:1938920 3" similar to i 
contains MER29.b2 MER29 repetitive element ; 


|Homo sapiens telomerase reverse transcriptase (TERT) gene, exons 1-6 j 


|Rattus norveglcus putative four repeat ion channel mRNA, complete cds J 


ht09g01.x1 NCLCGAP_Kid13 Homo sapiens cDNA clone 1MAGE:31 46258 3" similar to contains MER29.b3 
MER29 repetitive element ; 


jHomo sapiens mRNA for KLAA1143 protein, partial cds | 


|Hcmo sapiens mRNA for KIAA1143 protein, partial cds | 


[TRANSCRIPTION FACTOR AP-2 | 


|CMO-CT0307-310100-1 58-h03 CT0307 Homo sapiens cDNA j 


|HSC23F051 normalized Infant brain cDNA Homo sapiens cDNA clone c-23f05 j 
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|IL2-NT0101-2807u0-116-E04 NT0101 Homo sapiens cDNA | 


| Homo sapiens Y-linked zinc finger protein (ZFY) gene, complete ods j 
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[601 1 19860F1 NIH_MGC_17 Homo sapiens cDNA clone IMAGE:3029438 5' j 


|60111 9860F1 NIHJvlGC JI7 Homo sapiens cDN A clone IMAGE:3029438 5' J 


|601893208F1 NIH_MGC_17 Homo sapiens cDNA clone IMAGE:4138993 5* j 


|ze58c10.r1 Soares retina N2b4HR Homo sapiens cDNA clone IMAGE:363188 5' j 


|C18939 Human placenta cDNA (TFujiwera) Homo sapiens cDNA clone GEN-570C01 5* | 
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Top Hit Descriptor 


7e37c12.x1 NCI_CGAPJ_u24 Homo sapiens cDNA clone tMAGE:3284662 3' similar to SW:DHSA_HUMAN 
P31040 SUCCINATE DEHYDROGENASE [UBIQUINONE] FLAVOPROTEIN SUBUNIT PRECURSOR ; 


7e37c12.x1 NCLCGAP_Lu24 Homo sapiens cDNA clone 1MAGE:3284682 3' similar to SW:DHSA_HUMAN 
P31 040 SUCCINATE DEHYDROGENASE [UBIQUINONE] FLAVOPROTEIN SUBUNIT PRECURSOR ; 


EST383657 MAGE resequences, MAGL Homo sapiens cDNA | 


ha33d06.x1 NCI_CGAP_Kid12 Homo sapiens cDNA clone IMAGE:2875499 3' similar to contains THR.b3 
THR repetitive element ; 


C18939 Human placenta cDNA (TFujiwara) Homo sapiens cDNA clone GEN-570C01 5' j 


hd30b04jd Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:2910991 3' similar to contains 
MER!.t3 MER1 MER1 repetitive element ; 


Homo sapiens chromosome 21 segment HS21 C003 ] 


ac77b08.s1 Stratagene lung (#937210) Homo sapiens cDNA clone IMAGE:868599 3' | 


602022560F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:41 57991 5' j 


601809932F1 NlH_MGCJ8 Homo sapiens cDNA clone IMAGE:4040694 5' | 


CHR220532 Chromosome 22 exon Homo sapiens cDNA clone C22_728 5' [ 


VC65e06.r1 Stratagene Over (#937224) Homo sapiens cDNA clone 1MAGE:85570 5' | 


yc65e06.r1 Stratagene liver (#937224) Homo sapiens cDNA clone !MAGE:85570 5" | 


yf99b08,r1 Scares Infant brain 1NIB Homo sapiens cDNA clone IMAGE:30586 5' similar to gb:X12953 RAS- 
RELATED PROTEIN RAB-2 (HUMAN); 


yf99b08.r1 Soares infant brain 1NIB Homo sapiens cDNA clone IMAGE:30666 6' similar to gb:X12953 RAS- 
RELATED PROTEIN RAB-2 (HUMAN); 


HSC05F032 normalized Infant brain cDNA Homo sapiens cDNA clone c-05f03 3' j 


Rattus norvegicus putative four repeat ion channel mRNA complete cds 


Homo sapiens hypotheticaJ protein FU20420 (FLJ2O420), mRNA j 


OLFACTORY RECEPTOR 1 5 (OR3) j 


OLFACTORY RECEPTOR 1 5 (OR3) j 


hw05a1 1 .xl NCI_CGAP_Lu24 Homo sapiens cDNA clone IMAGE:3182012 3' J 
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Homo sapiens V1 -vascular vasopressin receptor AVPR1 A gene, promoter region and partial cds | 


Homo sapfens V1 -vascular vasopressin receptor AVPR1A gene, promoter region and partial cds j 
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Top Hit Descriptor 


AV736449 CB Homo sapiens cDNA clone CBFBIA08 5' | 


601 573207F1 NIHJv1GC_9 Homo sapiens cDNA clone 1MAGE;3834433 5' I 


nw21g02.s1 NCI_CGAP_GCB0 Homo sapiens cDNA clone 1MAGE:1241 138 3' similar to contains THR.t3 
THR repetitive element ; 


hw07c05.x1 NCLCGAPJ-U24 Homo sapiens cDNA clone IMAGE:3182216 3' similar to TR:088539 088539 
WW DOMAIN BINDING PROTEIN 11. ; 


Homo sapiens calcium channel alphalE subunit (CACNA1E) gene, exons 7-49, and partial cds, alternatively 
spliced 


602021 164F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:41 56670 5' | 


Homo sapiens short-chain alcohol dehydrogenase family member (HEP27) mRNA j 


Homo sapiens short-chain alcohol dehydrogenase family member (HEP27) mRNA J 


to12b09jd NCLCGAP_Ut2 Homo sapiens cDNA clone IMAGE.21 78809 3' similar to contains OFR.tl OFR 
repetitive element ; 


AV730056 HTF Homo sapiens cDNA clone HTFAVE06 5' | 


Human hLRP mRNA for leukocyte common antigen-related peptide (protein-tyroslne phosphate) (EC 
3.1.3.48) 


602021 164F1 NCI_CGAP_Bm67 Homo sapiens cDNA clone IMAGE:41 56870 5' 1 


EST383857 MAGE resequences, MAGL Homo sapiens cDNA | 


no16h01.s1 NCI_CGAP_Phe1 Homo sapiens cDNA clone IMAGE:1 100881 3' similar to contains L1 .t1 L1 
repetitive element ; 


Homo sapiens chromosome 21 segment HS21 C085 I 


HSPD21201 HM3 Homo sapiens cDNA clone S4000107H06 j 


HSPD21201 HM3 Homo sapiens cDNA clone s4000107H06 j 


Human glyceraldehyde-3-phosphate dehydrogenase (GAPDH) gene, complete cds j 


Homo sapiens similar to RAD23 (S. cerevlsiae) homolog B (H, sapiens) (LOC63277), mRNA J 


< 

I 
I 

CO 

2 

CD 
? 
-3 

c 

8 
8 

& 

ta 
to 

3 

E 

C4 

23 


Mus musculus SRY-box containing gene 6 (Sox6), mRNA | 


QV1-FT01 69-1 00700-271 -a02 FT0169 Homo sapiens cDNA j 


Homo sapiens solute carrier family 5 (choline transporter), member 7 (SLC5A7), mRNA j 


Homo sapiens spermidine synthase (SRM) mRNA \ 


Homo sapiens spermidine synthase (SRM) mRNA | 
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Homo sapiens KVLQT1 gene | 


Homo sapiens KVLQT1 gene I 


Homo sapiens DKFZp434P21 1 protein (DKFZP434P21 1 ), mRNA | 


Homo sapiens catenin (cadherin-associated protein), alpha 2 (CTNNA2), mRNA j 


Homo sapiens catenin (cadherin-associated protein), alpha 2 (CTNNA2), mRNA | 


EST364065 MAGE resequences, MAGB Homo sapiens cDNA 


Homo sapiens DKFZp434P21 1 protein (DKFZP434P21 1 ), mRNA j 


Homo sapiens sema domain, seven thrombospondin repeats (type 1 and type 1-like), transmembrane domain 
(TM) and short cytoplasmic domain, (semaphorin) 5A (SEMA5A), mRNA 


Homo sapiens sema domain, seven thrombospondin repeats (type 1 and type Hike), transmembrane domain 
(TM) and short cytoplasmic domain, (semaphorin) 5A (SEMA5A), mRNA 
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7H1 5A04 Chromosome 7 HeLa cDNA Library Homo sapiens cDNA clone 7H1 5A04 j 
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Top Hit Descriptor 


H.sapiens Ig lambda light chain variable region gene (7c.11.2) germllne; Ig-Llght-Lambda; VLambda 


wJ49c04jc1 NC1_CGAP_Lu19 Homo sapiens cDNA clone IMAGE:2406160 3' similar to contains THR.D2 
THR repetitive element ; 


ne06a09.s1 NCI_CGAP_Co3 Homo sapiens cDNA clone IMAGE:880408 3' similar to contains THR.b2 THR 
repetitive element ; 


zi27a1 1 .s1 Scares Jeta! Jiver_spleen_1 NFLS_S1 Homo sapiens cDNA clone IMAGE:431 998 3' j 


Homo sapiens Bruton*3 tyrosine kinase (BTK), alpha-D-galactosidase A (GLA), L44-like ribosomal protein 
(L44L) and FTP3 (FTP3) genes, complete cds 


zt59eOZr1 Soares_tsstis„NHT Homo sapiens cDNA clone IMAGE:726650 5' similar to SW:RSP1 JvlOUSE 
Q01730 RSP-1 PROTEIN. ; 


|Mus muscuius sperm tail associated protein (Step), mRNA j 


|601445137F1 N!H MGC 65 Homo sapiens cDNA clone IMAGE:3849297 5 j 


|yr32d01 .r1 Soares fetal liver spleen 1 NFLS Homo sapiens cDNA clone lMAGfc:2U6977 5' ! 
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|Homo sapiens KIAA0555 gene prpduct (KIAA0555), mRNA 


| Homo sapiens KIAA0555 gene product (K1AA0555), mRNA | 
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| Homo sapiens centaurln-alpha 2 protein (HSA2721 95), mRNA J 
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|df50e03.y1 Morton Fetal Cochlea Homo sapiens cDN A clone IMAGE:2486861 5' j 


|df50e03.y1 Morton Fetal Cochlea Homo sapiens cDNA clone IMAGE;2485861 5' j 


in 

CO 

CO 

s 

CM 
LU 
< 

i 
0 

< 

z 

Q 

tn 
a 

CO 

a. 

s 

0 
X 

E 
m 

a. 

§ 
u 

O 
z 

s 

1 

CM 
O 

CO 


|602072264F1 NCI_CGAP_Brn67 Homo sapiens cDNA clone IMAGE:421 5398 b - | 


| AV71 5377 DCB Homo sapiens cDNA clone DCBAIE03 5' ! 


| Homo sapiens Xq pseudoaulosomal region; segment 1/2 J 
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Homo sapiens hypothetical protein FU13556 similar to N-myc downstream regulated 3 (FU13556), mRNA 


Homo sapiens hypothetical protein RJ13556 similar to N-myc downstream regulated 3 (FU13556), mRNA 


Homo sapiens hypothetical protein FLJ13556 similar to N-myc downstream regulated 3 (FLJ 13556), mRNA 


Homo sapiens hypothetical protein FU13556 similar to N-myc downstream regulated 3 (FU13556), mRNA 


|Homo sapiens transforming growth factor, beta-induced, 68kD (TGFBI), mRNA | 


|Homo sapiens transforming growth factor, beta-Induced, 68kD (TGFBI), mRNA | 
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|Homo sapiens phosphoribosyl pyrophosphate synthetase-associated protein 2 (PRPSAP2) mRNA j 


[Homo sapiens phosphoribosyl pyrophosphate synthetase-associated protein 2 (PRPSAP2) mRNA j 
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[Homo sapiens hydroxystenoid (1 7-beta) dehydrogenase 4 (HSD17B4), mRNA | 


jHomo sapiens Ran GTPase activating protein 1 (RANGAP1 ), mRNA I 


|Homo sapiens DNA for Human P2XM, complete cds j 


|Homo sapiens hypothetica! protein FU10675 (FLJ10675), mRNA J 


|Human endogenous retroviral DNA (4-1 ), complete retroviral segment J 


| Human endogenous retroviral DNA (4-1 ), complete retroviral segment j 
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Human aldolase C gene for fructose-1 ,6-blsphosphate aldolase j 


Homo sapiens chromosome 21 segment HS21 C027 I 


Rattus norvegicus putative four repeat ion channel mRNA, complete cds 


dfDBgOS.yl Morton Fetal Cochlea Homo sapiens cDNA clone IMAGE:2483145 5' 
Homo sapiens chromosome 21 segment HS21 C002 


Homo sapiens protein tyrosine phosphatase PTPCAAX1 (hPTPCAAXI) mRNA, complete cds 
Homo sapiens proteasome (prosome, macropain) subunit, beta type, 2 (PSMB2), mRNA 


Homo sapiens protein kinase, cAMP-dependent, regulatory, type II , beta (PRKAR2B) mRNA J 


Homo sapiens core binding factor alphal subunit (CBFA1 ) gene, exon 3 j 
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Homo sapiens heterogeneous nuclear ribonucleoprotein C (C1/C2) (HNRPC) mRNA ] 
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Homo sapiens chromosome 21 segment HS21 C085 j 


Homo sapiens hookl protein (HOOK1 ), mRNA j 
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42f6 Human retina cDN A randomly primed sublibrary Homo sapiens cDNA j 


Homo sapiens hypothetical protein FLJ11666 (FLJ1 1656), mRNA I 
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Top Hit Descriptor 


H.sapiens immunoglobulin kappa light chain variable region L14 j 


Human MSH3 gene, exon 1 0 j 


Homo sapiens TATA box binding protein (TBP) mRNA | 


wh60d06jrt NCI_CGAP_Kid1 1 Homo sapiens cDNA clone IMAGE:23841 71 3* j 


601458531 F1 NIH_MGC_68 Homo sapiens cDNA clone IMAGE:3862086 5' [ 


cn06h02.y1 Normal Human Trabecular Bone Cells Homo sapiens cDNA clone NHTBC_cn06h02 random 


au93h05jc1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2783885 3' similar to 
TR:075788 075786 GANGLIOSIDE-INDUCED DIFFERENTIATION ASSOCIATED PROTEIN 1. ; 


au93h05.x1 Schneider fetal brain 00004 Homo sapiens cDNA clone IMAGE:2783865 3' similar to 
TR:075786 075786 GANGLIOSIDE-INDUCED DIFFERENTIATION ASSOCIATED PROTEIN 1 . ; 


wf48c1 1 .xl Soares_N FL_T_GBC_S 1 H omo sapiens cD NA clone IMAGE :23 58839 3' j 


H.sapiens DNA for ZNF80-linked ERV9 long terminal repeat > \ 


au66c07.x1 Schneider fetal brain 00004 Homo 3apien3 cDNA clone IMAGE:2781228 3* similar to contains 
element TAR1 repetitive element ; 


Homo sapiens polymerase (RNA) III (DNA directed) (39kD) (RPC39), mRNA j 


AV762869 MDS Homo eapiens cDN A clone MDSEIC1 2 5' j 


Homo sapiens hypothetical protein (LOC57143), mRNA j 


Human mRNA for KIAA0184 gene, partial cds | 


Homo sapiens catenin (cadherin-assoclated protein), delta 2 (neural plakophilin-related arm-repeat protein) 
(CTNND2), mRNA 


Homo sapiens 1 7-beta-hydroxysteroid dehydrogenase IV (HSD17B4) gene, promoter region and exon 1 


EST377682 MAGE resequences, MAGI Homo sepleno cDNA ) 


Homo sapiens KIAA0680 gene product (KIAA0680), mRNA | 


Homo sapiens plasminogen activator, tissue (PLATa) mRNA ( 


Homo sapiens plasminogen activator, tissue (PLATa) mRNA j 


Homo sapiens mRNA for KIAA1 1 12 protein, partial cds j 


Homo sapiens mRNA for KIAA1 1 1 2 protein, partial cds I 


Homo sapiens A kinase (PRKA) anchor protein 1 (AKAP1), mRNA [ 


Homo sapiens A kinase (PRKA) anchor protein 1 (AKAP1), mRNA [ 


Homo sapiens zona pelluclda glycoprotein 2 (sperm receptor) (ZP2) mRNA j 


Homo sapiens chromosome 21 segment HS21 C084 ] 


Homo sapiens Testis-specific XK-related protein on Y (XKRY) mRNA | 
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Database 
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CO 
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4507378| 
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BF035327.1 | 
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AW1 57281.1 


AW1 57281.1 


AI807484.1 I 


X83497.1 j 


AW162304.1 
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$ 
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D80006.1 I 


11034B10 


AF057720.1 


AW965524.1 f 
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45020141 


4502014| 
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AL1 63284.2 i 
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ORFSEQ 
ID NO: 




37739| 


27691 1 


33904| 




33970 


27189 


27190 


28530 | 


29978 1 


30526 


34544 1 


35443| 


36745| 


26210 


31138 
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28535 


29189 


30007 


30107 
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NO: 
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148561 


14856| 


15891 1 
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165571 


17372 


17470j 
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| 2125| 
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Top Hit Descriptor 


Homo eapiens small nuclear ribonucleoprotein D3 polypeptide (18kD) (SNRPD3) mRNA | 


Homo sapiens differentiatfon-reiated gene 1 (nickel-specific induction protein) (RTP) mRNA J 


Homo sapiens differentiation-related gene 1 (nickel-specific induction protein) (RTP) mRNA j 


Homo sapiens mRNA for KIAA1081 protein, partial cds j 


hyaluronan-binding prote!n=hepatocyte growth factor activator homolog [human, plasma, mRNA, 2408 nt] 


Homo sapiens phosphate cytidyiyitransferase 1 , choline, beta isoform (PCYT1 B), mRNA f 


Human mRNA for integrin alpha-2 subunit j 


Homo sapiens S-antlgen; retina and pineal gland (arrestln) (SAG), mRNA j 


Homo sapiens KIAA0433 protein (KIAA0433), mRNA j 


Homo sapiens KIAA0433 protein (KIAA0433), mRNA j 


Homo sapiens RAN binding protein 7 (RANBP7), mRNA j 


Homo sapiens chromosome 21 segment H S21 C004 


Homo sapiens MHC class 1 region | 


Homo sapiens MHC class 1 region j 


Homo sapiens interleukin 1 0 receptor, beta (IL1 ORB), mRNA j 


Homo sapiens cullin 4A (CUL4A) mRNA. complete cds | 


Homo sapiens mRNA for KIAA0581 protein, partial cds j 


Homo sapiens ornithine decarboxylase 1 (ODC1 ) mRNA J 


yr12f04.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:205087 5' similar to contains 
LTR5 repetitive element ; 


yr12f04.r1 Soares fetal Over spleen 1NFLS Homo sapiens cDNA clone IMAGE:205087 5' similar to contains 
LTR5 repetitive clement ; 


601 65B751 R1 NIH_MGC_69 Homo sapiens cDNA clone IMAGE:3886069 3' j 


yq78h09.r1 Soares fetal liver spleen 1 NFLS Homo sapiens cONA clone IMAGE:201953 5' similar to contains 
OFR repetitive element ; 


wf52c07j<1 Soares_NFL_T_GBC_S1 Homo sapiens cDNA clone IMAGE:235921 2 3' j 


EST11498 Uterus Homo sapiens cDNA.5' end similar to similar to retrovirue-related pol I 


hr81f05.x1 NCI_CGAP_Kid1 1 Homo sapiens cDNA clone IMAGE:3134913 3" similar to SW :RHOP_MOUSE 
Q61085 GTP-RHO BINDING PROTEIN 1 ; 


Homo eapiens chromosome 21 segment HS21C078 

Homo sapiens v-raf-1 murine leukemia viral oncogene homolog 1 (RAF1), mRNA 
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AF055066.1 ) 
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Top Hit Descriptor 


Homo sapiens P/OKcl.19 mRNA for ubiquitin-conjyugating enzyme E2, complete cds j 


Homo sapiens mRNA for CSR2, complete cds | 


Homo sapiens low density llpoprotein-related protein 2 (LRP2), mRNA j 


Homo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA j 


Homo sapiens GTP binding protein 1 (GTPBP1), mRNA | 


RC4-BT031 0-1 10300-01 5-fl 0 BT031 0 Homo sapiens cDNA I 


oc66h11.s1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:1 354725 3' similar to SW:POL_MLVRK 
P31795 POL POLYP ROTEIN ; 


AV714334 DCB Homo sapiens cDNA clone DCBAMA08 5' \ 


NUCLEOLAR TRANSCRIPTION FACTOR 1 (UPSTREAM BINDING FACTOR 1) (UBF-1) 
(AUTOANTIGEN NOR-90) 


Homo sapiens hypothetical protein (FLJ20261 ), mRNA | 


qg56a04.x1 Soares_testis_NHT Homo sapiens cDNA clone IMAGE: 18391 50 3' similar to TR:O15103 
015103 HYPOTHETICAL 27.3 KD PROTEIN. ; 


Human zinc finger protein ZNF131 mRNA, partial cds | 


< 

Z 

QC 
E 

f 
U 
u 

c 
1 

CL 

tn 

lO 

8 

w 
c 
a 
5. 

3 

X 


co 

5 

s 

oo 
UJ 

O 

| 

a 

6 

u 

< 

Z 
% 

g 

CL 

s 

i 

o 

X 

2j 
_i 
o 

St 
o 
o 

o 
z 

IN 

s 

o 

? 


to 

in 

S> 
co 

& 

LU 

o 

1 

« 

g 

o 
< 

z 

9 
g 

5. 

s 
g 

o 
X 

Ij 
_I 

o 

o 
o 

o 
z 

"£ 

CN 
O 

? 

o 


In 

8 

oo 

CN 

ui 
o 

i 
g 

o 
o 

< 

Z 

to 
c 

1 

o 

1 

O 
CD 
2 
X 

z 

s 

i 
I 

o 

1 


Homo sapiens CGI-18 protein (LOC51 008), mRNA j 


MR3-ST0203-130100-025-609 ST0203 Homo sapiens cDNA | 


wx51e07.Xl NCI_CGAP_Lu28 Homo sapiens cDNA clone IMAGE :2547204 3' similar to SW:GG95_HUMAN 
Q08379 GOLGIN-95. ;eontains element MER22 repetitive element ; 


Homo sapiens Xq pseudoautosomal region; segment 1/2 ] 


Homo sapiens Xq pseudoautosomal region; segment 1/2 ] 


Human xanthine dehydrogenase/oxidase mRNA, complete cds j 


Human xanthine dehydrogenase/oxidase mRNA, complete cds { 


Homo sapiens ryanodine receptor 3 (RYR3) mRNA . ] 
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Homo sapiens ryanodine receptor 3 (RYR3) mRNA ] 


fh07g09.x1 NIH_MGC_17 Homo sapiens cDNA clone IMAGE:2961616 5' | 


Homo sapiens muscle specific gene (M9), mRNA | 


Homo sapiens muscle specific gene (M9), mRNA ] 
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Top Hit Descriptor 


HSPD1 81 78 HM3 Homo sapiens cDNA clone s3000023D09 | 


Cricetulus longicaudatus mRNA for EF-1 alpha, complete cds I 


Homo sapiens gene for activln receptor type IIB, complete cds j 
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IL3-CT0534-1 80900-273-A01 CT0534 Homo sapiens cDNA | 


FORMIN 4 (LIMB DEFORMITY PROTEIN) I 
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QVO-BT0074-1 30999-0 14-g04 BT0074 Homo sapiens cDNA | 


Homo sapiens meningioma (disrupted in balanced translocation) 1 (MN1), mRNA [ 


QV4-ST0234-1 81 199-037-TO5 ST0234 Homo sapiens cDNA | 


Homo sapiens mRNA for KIAA0577 protein, complete cds ] 


Homo eapiene mRNA for KIAA0577 protein, complete cds j 


601177002F1 NIH_MGC_17 Homo sapiens cDNA clone IMAQE:3532344 5' \ 
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Homo sapiens cell recognition molecule Caspr2 (KIAA0863), mRNA | 


Homo sapiens sentrin/SUMO-specific protease (SEN P1), mRNA | 


zw74d02.r1 Soares_testls_NHT Homo sapiens cDNA clone IMAGE:781923 5' | 


< 
z 

cc 

E 
m 

UJ 
Q 
£L 

K 

CD 

1 
f 

•s. 
§ 

s. 

to 
w 
o 
E 
o 
X 


Homo sapiens phosphodiesterase 7B (PDE7B), mRNA j 


Homo sapiens MIF2 suppressor (HSMT3) mRNA, complete cds 


Homo sapiens myosin IC (MY01 C), mRNA ] 


Homo sapiens lnterleukin-7 receptor precursor (IL7R) gene, exon3 7 and 8 and complete cds j 


Human protein kinase C substrate 80K-H (PRKCSH) gene, exon 4-5 j 


Human protein kinase C substrate 80K-H (PRKCSH) gene, exon 4-5 | 


Homo sapiens CGI-76 protein (LOC51 632), mRNA \ 


Homo sapiens CGI-76 protein (LOC51832), mRNA f 


Homo sapiens meningioma (disrupted In balanced translocation) 1 (MN1 ), mRNA j 


Homo sapiens low density lipoprotein-related protein 2 (LRP2), mRNA | 


Homo sapiens pre-B-cell colony-enhancing factor (PBEF) mRNA | 


Homo sapiens pre-B-cell colony-enhancing factor (PBEF) mRNA | 


Homo sapiens 26S proteasome-associated padi homolog (POH1 ) mRNA | 


Homo sapiens EphA4 (EPHA4) mRNA | 
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Homo sapiens tumor suppressor deleted in oral cancer-related 1 (DOC-1R) mRNA \ 


Homo sapiens MIST mRNA, partial cds j 


Homo sapiens MIST mRNA, partial cds ] 


Homo sapiens mRNA for KIAA1294 protein, partial cds | 


Homo sapiens mRNA for Kl AA1 294 protein, partial cds j 


Human displacement protein (CCAAT) mRNA | 


Human displacement protein (CCAAT) mRNA | 


Human PBX3 mRNA | 


Homo sapiens phospholipid scramblase 1 gene, exon 1 and 5' flanking region j 


Homo sapiens karyopherin beta 2b, transportin (TRN2), mRNA i 


Homo sapiens karyopherin beta 2b, transportin (TRN2), mRNA | 


Homo sapiens glutamete-cysteine ligase (gamma-glutamylcystelne synthetase), catalytic (72.8kD) (GLCLC) 
mRNA 

|Homo sapiens NDST4 mRNA for N-deacetylase/N-sulfotransferase 4, complete cds 


Homo sapiens NDST4 mRNA for N-deacetylase/N-sulfotransferase 4. complete cds 
Homo sapiens spastic paraplegia 4 (autosomal dominant; spastin) (SPG4), mRNA 


Homo sapiens spastic paraplegia 4 (autosomal dominant; spastin) (SPG4), mRNA f 


Homo sapiens HIR (histone cell cycle regulation defective, S. cerevisfae) homolog A (HIRA), mRNA 


Homo sapiens HIR (histone cell cycle regulation defective, S. cerevlsiae) homolog A (HIRA), mRNA 


Homo sapiens amyloid beta (A4) precursor protein (protease nexin-ll, Alzheimer disease) (APP), mRNA 


Human Ku (p70/p80) subunlt mRNA, complete cds | 


Homo sapiens CMP-N-acetylneuramlnic acid synthase (LOC55907), mRNA | 


Homo sapiens KLAA0792 gene product (KIAA0792), mRNA j 
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Top Hit Descriptor 


EST188312 HCC cell line (metastasis to liver in mouse) II Homo sapiens cDNA 5' end similar to similar to 
FAC1 
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|AV724832 HTB Homo sapiens cDNA clone HTBAKB01 5" | 


| MR4-BT0598-01 080CM)05-d05 BT0598 Homo sapiens cDNA [ 


| MR4-BT059&-01 0600-005-dQS BT0598 Homo sapiens cDNA . l 


|ba08g08.y1 NIH_MGC_7 Homo sapiens cDNA clone IMAGE:2823806 5' ] 


Io 

8 

CO 
CO 
CN 

8 

» 

C 

o 

i 

9 

« 
c 

CD 
Q. 

co 
o 

o 
5 . 

X 

z 

CO 

o 

D> 


z 
a 

*> 

s 

f 

o 

E 
□ 
X 

I 

CO 
O 

a 
co 

6 
S 
§ 

CM 
S 

> 

a 


•Homo sapiens zinc finger protein ZFP-95 (ZFP95) mRNA, alternatively spliced, complete cds j 


yd93a01.r1 Soaresfetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:115752 5' eimilar to 
SPA44282 A44282 RETROVIRUS-RELATED POL POLYPROTEIN - HUMAN ; 


Homo sapiens hect domain and RLD 2 (HERC2), mRNA j 


jHomo sapiens hypothetical protein FU20758 (FLJ20758), mRNA I 


|RC3-LT0023-200100-012-dl1 LT0023 Homo sapiens cDNA j 


|RC3-LT0023-200100-012-d11 LT0023 Homo sapiens cDNA I 


qh67c02.xl Soares_fetal_liver_spleen_1NFLS_S1 Homo sapiens cDNA clone IMAGE:1849730 3' similar to 
TR:Q14498 Q14498 SPLICING FACTOR. [1] ;contajns AIu repetitive element-contains element L1 repetitive 
element ; 


aa23f09.s1 NCI_CGAP_GCB1 Homo sapiens cDNA clone IMAGE:814121 3" similar to SW:CPTRJ=LAPR 
P49131 CHLOROPUST TRIOSE PHOSPHATE TRANSLOCATOR PRECURSOR. ; 


aa23f09.s1 NCI_CGAP__GCB1 Homo sapiens cDNA clone IMAGE:814121 3' similar to SW:CPTR FLAPR 
P49131 CHLOROPLASTTRIOSE PHOSPHATE TRANSLOCATOR PRECURSOR. ; 


yu28a03.r1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:235084 5" j 


| Homo sapiens eukaryotic translation initiation factor 2B, subunit 2 (beta, 39kD) (EIF2B2), mRNA j 


|Homo sapiens eukaryotio translation initiation factor 2B, subunit 2 (beta, 39kD) (EIF2B2), mRNA ! 


|yd29d09.s1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:1 09649 3' | 


Homo sapiens WEE1 gene for protein kinase and partial ZNF143 gene for zinc finger transcription factor 


| Homo sapiens pre-B-cell colony-enhancing factor (PBEF) mRNA | 
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Top Hit Descriptor 


Human chondroitin sulfate proteoglycan versican VO splice-variant precursor peptide mRNA, complete cds 


Human chondroitin sulfate proteoglycan versican VO splice-variant precursor peptide mRNA, complete cds 


Human gamma-aminobutyric acid transaminase mRNA, partial cds ] 


TCAAP1E1252 Pediatric acuta myelogenous leukemia cell (FAB M1) Bayior-HGSC project=TCAA Homo 
sapiens cDNA clone TCAAP1 252 


Homo sapiens 959 kb contig between AML1 and CBR1 on chromosome 21q22, segment 3/3 | 


Homo sapiens hypothetical protein FLJ20585 (FU2Q585), mRNA | 


TCR V delta 2-C alpha =T-cell receptor delta and C alpha fusion gene {alternatively 3pliced, splice junction} 
[human, precursor B-cell line REH, mRNA Partial, 211 nt] 


Homo sapiens hypothelical protein (FLJ1 1 1 27), mRNA | 


Homo sapiens protein methyltransferase (JBP1 ) mRNA, complete cds ] 


Homo sapiens protein methyltransferase (JBP1 ) mRNA, complete cds I 


Wb31a08.x1 NCI_CGAP_GC6 Homo sapiens cDNA clone IMAGE:2307254 3" ( 


Homo sapiens semaphorin W (SEMAW) mRNA | 


Homo eapiens growth factor receptor-bound protein 10 (GRB10) gene, exon 5 ] 


Homo sapiens growth factor receptor-bound protein 10 (GRB10) gene, exon 5 J 


Homo sapiens mRNA for K1AA1081 protein, partial cds | 


Homo eapiens mRNA for KIAA1081 protein, partial cds j 


Homo sapiens ribosomal protein L3-Iike (RPL3L) mRNA j 


Homo sapiens basic transcription factor 2 p44 (btf2p44) gene, partial cds, neuronal apoptosls inhibitory 
protein (naip) and survival motor neuron protein (smn) genes, complete cds 


Homo sapiens nuclear receptor subfamily 1 , group H , member 3 (NR1 H3), mRNA | 


Homo sapiens S1 00A1 2 gene for Calgranulln C, exon 2 and Joined cds I 


Homo sapiens solute carrier family 13 (sodium-dependent dicarboxylate transporter), member 2 (SLC13A2), 
mRNA 


601890419F1 NIH_MGC_17 Homo sapiens cDNA clone IMAGE:4131461 5' | 


601890419F1 NIH_MGC_17 Homo sapiens cDNA clone 1MAGE:41 31 461 5' I 
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Rattus norveglcus putative phosphate/phosphoenolpyruvate translocator mRNA. complete cds j 
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Top Hit Descriptor 


RC6-HT0678-220500-01 1 -C03 HT0678 Homo sapiens cDNA | 


Homo sapiens PDZ-73 protein (PDZ-73/NY-CO-38), mRNA | 


Homo sapiens PDZ-73 protein (PDZ-73/NY-CO-38), mRNA | 


Homo sapiens PDZ-73 protein (PDZ-73/NY-CO-38), mRNA | 


Homo sapiens PDZ-73 protein (PDZ-73/NY-CO-38), mRNA j 


601557524F1 NIH_MGC_58 Homo sapiens cDNA clone IMAGE:3827549 5* | 


Homo sapiens mRNA for KIAA1 395 protein, partial cds [ 


Homo sapiens chromosome 21 segment HS21 CO04 I 


zp96a06.s1 Stratagene muscle 937209 Homo sapiens cDNA clone IMAGE:628018 3' | 


Homo sapiens MIsshapen/NIK-related kinase (MINK), mRNA S 


QV4-ST0234-181 199-037-f05 ST0234 Homo sapiens cDNA | 


Homo sapiens hypothetical protein FLJ11 026 (FLJ 11026), mRNA | 


Homo sapiens beta 2 gene j 


Homo sapiens zinc finger protein 259 (ZNF259) mRNA | 


Homo sapiens mRNA for KIAA0833 protein, partial cds ( 


Homo sapiens chromosome 21 segment HS21 C046 j 


Homo sapiens mannosldase K alpha, class 2A, member 1 (MAN2A1), mRNA | 
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Homo sapiens glutamate receptor, ionotropio, kainate 1 (GRIK1 ) mRNA | 


Homo sapiens glutamate receptor, ionotropic. kainate 1 (GRIK1 ) mRNA J 


Homo sapiens chromosome 21 segment HS21 C068 j 


RC2-BT0642-270300-01 9-f06 BT0642 Homo sapiens cDNA | 


Human neurofibromin (NF1 ) gene, complete cds j 


Homo sapiens KIAA0852 protein (KIAA0852), mRNA | 


6O1O70088F1 NIH_MGC_12 Homo sapiens cDNA done IMAGE:3456260 5' j 
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Homo sapiens tracheal epithelium enriched protein (PLUNC) gene, complete cds | 


MR0-HT0559-230500-021 -a03 HT0559 Homo sapiens cDNA } 


Homosapiens partial AK1 55 gene for AK1 55 protein, exons 1-3 and joined CDS ] 


Homo sapiens partial AK1 55 gene for AK1 55 protein, exons 1-3 and Joined CDS j 


Homo sapiens hypothetical protein FU10783 (FLJ10763), mRNA { 
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Top Hit Descriptor 


ac83b02.y5 Stratagene lung (#837210) Homo sapiens cDNA clone IMAGE;869163 5" similar to TR:014591 
014591 SIMILARITYTO P22059 ; 


|Human mRNA for possible protein TPRDIl, complete cds | 


| QV3-OT0028-220300-1 32-b1 1 OT0028 Homo sapiens cDNA j 


iHomo sapiens EGF-like repeats and discoidln I-Oke domains 3 (EDIL3), mRNA f 


i Gorilla gorilla olfactory receptor (GGOi8) gene, partial cds | 


Homo sapiens mRNA for Kl AA1 081 protein, partial cds | 


| Homo sapiens A kinase (PRKA) anchor protein 10 (AKAP10), mRNA j 


|Homo sapiens TPCR86 protein (HSTPCR86P), mRNA "" | 


|Homo sapiens similar to ribosomal protein S26 (H. sapiens) (LOC63150), mRNA j 


|Homo sapiens HIRA interacting protein 4 (dnaJ-like) (H1RIP4), mRNA | 


Human mRNA for HMG-1, complete cds | 


| Human mRNA for HMG-1 , complete cds j 


IEST37301 Embryo, 8 week I Homo sapiens cDNA 5' end j 
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yp11h02.M Scares breast 3NbHBst Homo sapiens cDNA clone IMAGE:1871 55 5' similar to 
SPrANKBJHUMAN Q01484 ANKYRIN, BRAIN VARIANT 1 ; 


601866926F1 NIH_MGC_1 7 Homo sapiens cDNA clone IMAGE:4109503 5' | 


Homo sapiens proteasome (prosome, macropain) 26S subunlt, non-ATPase, 7 (Mov34 homolog) (PSMD7) 
mRNA 


ze62e02.r1 Soares retina N2b4HR Homo sapiens cDNA clone IMAGE:363578 5' 


ze62e02.r1 Soares retina N2b4HR Homo sapiens cDNA clone IMAGE:363578 5' | 


yeQ9f04.s1 Soares fetal liver spleen 1NFLS Homo sapiens cDNA clone IMAGE:1 23007 3' similar to contains 
MER1 0 repetitive element ; 


zu91g01 .s1 Soares JesHsJMHT Homo sapiens cDNA clone IMAGE:745392 3' | 


Homo sapiens polymerase (RNA) II (DNA directed) polypeptide E (25kD) (POLR2E) mRNA j 


Homo sapiens polymerase (RNA) II (DNA directed) polypeptide E (25kD) (POLR2E) mRNA j 


Homo sapiens interferon (alpha, beta and omega) receptor 2 (IFNAR2) mRNA j 
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7 Homo sapiens glucokinase (GCK) gene, exon 2 j 


Homo sapiens disintegrin and metalloprotease domain 10 (ADAM10) mRNA | 


Homo sapiens tousled-like kinase 1 (TLK1 ) mRNA, complete cds J 
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Top Hit Descriptor 


Human serine/threonine kinase MNB (mnb) mRNA, complete cds | 


Homo sapiens low density lipoprotein-rdated protein 2 (LRP2), mRNA 1 


wa20b08.x1 NCI_CGAP_Kid1 1 Homo sapiens cD NA clone IMAGE: 229861 5 3' j 


Homo sapiens peptide YY (PYY), mRNA I 


RC2-BN0074-090300-014-C12 BN0074 Homo sapiens cDNA j 


Homo sapiens mRNA for activator of S phase Kinase, complete cds f 


Homo sapiens ubiquitin-conjugating enzyme E2E 3 (homologous to yeast UBC4/S) (UBE2E3) mRNA 


Homo sapiens hypothetical protein FLJ20345 (FLJ20345), mRNA j 


Homo sapiens cAMP response element-binding protein CRE-BPa (H__GS1 65L1 6.1 ), mRNA | 


Homo sapiens cAMP response element-binding protein CRE-BPa (H_GS1 65L1 5.1 ), mRNA [ 


Homo sapiens threonyl-tRNA synthetase (TARS), mRNA ] 


Homo sapiens threonyl-tRNA synthetase (TARS), mRNA | 


Homo sapiens casein kinase II alpha subunit mRNA, complete cds j 


Homo sapiens casein kinase II alpha subunit mRNA, complete cds | 


Homo sapiens DNA for amyloid precursor protein, complete cds ] 


Homo sapiens zinc finger protein 219 splice variant 1 (ZNF216) mRNA, complete cds j 


Homo sapiens zinc finger protein 216 splice variant 1 (ZNF216) mRNA, complete cds [ 


Homo sapiens TRAF6-regutated IKK activator 1 beta Uev1 A mRNA, complete cds j 


Homo sapiens suppressor of white apricot homolog 2 (SWAP2), mRNA j 


Homo sapiens suppressor of white apricot homolog 2 (SWAP2), mRNA j 


Homo sapiens chromosome 21 segment HS21 C01 0 | 


Homo sapiens period (DrosophHa) homolog 3 (PER3). mRNA | 


601472766T1 NIH J/IGC.68 Homo sapiens cDNA clone IMAGE:3875857 3' j 


zj94e04.s1 Soares_fetaJ_liver_spleen_1 NFLS_S1 Homo sapiens cDNA clone IMAGE:462558 3' similar to 
TR.-Q15408 Q15408 NEUTRAL PROTEASE LARGE SUBUNIT ; 


Homo sapiens chromosome 21 segment HS21 C082 | 


Homo sapiens hypothetical protein FLJ10283 (FLJ 10283), mRNA j 


Homo sapiens intersectin short isofomi (ITSN) mRNA, complete cds \ 


Homo sapiens cell-line tsA201a chloride ion current Inducer protein l(Cln) gene, complete cds j 


Human zinc finger protein ZNF131 mRNA, partial cds ( 


Homo sapiens MSTP016 (MST01 6) mRNA, complete cds | 


Homo sapiens mRNA for KIAA0892 protein, partial cds ] 
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Top Hit Descriptor 


Homo sapiens H3 histone family, member J (H3FJ) mRNA | 


Homo sapiens HMT-1 mRNA for beta-1 ,4 mannosyltransferase, complete cds j 


Homo sapiens HMT-1 mRNA for beta-1 ,4 mannosyltransferase, complete cds I 


Mus musculus keratin complex 2, gene 6g (Krt2-6g), mRNA j 


HSPD1 31 55 HM3 Homo sapiens cDNA clone s4000045F03 [ 


Homo sapiens chromosome 21 segment HS21CO10 | 


PM0-GN001 8-040900-002-E03 GN0018 Homo sapiens cDNA | 
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yg65a08.r1 Soares infant brain 1NIB Homo sapiens cDNA clone IMAGE:38060 5' | 


RET4B7 subtracted retina cDNA library Homo sapiens cDNA clone RET4B7 i 


nn80d01.s1 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1 090177 3' ) 


Homo sapiens Golgi transport complex protein (90 kDa) (GTCSO), mRNA j 


yc86f12.rt Soares infant brain 1NIB Homo sapiens cDNA clone IMAGE:22851 5* similar to 
SP:K1CR _XENLA P08802 KERATIN, TYPE I CYTOSKELETAL ENDO B ; 


EST376343 MAGE resequences, MAGH Homo sapiens cDNA \ 


Homo sapiens GGT gene, exon 6 [j 


2t70f12,r1 Soares_testls_NHT Homo sapiens cDNA clone IMAGE:727727 5* similar toTR:G191315 
G191315 ANDROGEN-DEPENDENT EXPRESSED PROTEIN. ; 


Homo sapiens chromosome 21 segment HS21C1 03 \ 


Homo sapiens chromosome 21 unknown mRNA ~~\ 


nn01f12.x5 NCI_CGAP_Co9 Homo sapiens cDNA clone IMAGE:1076495 3* similar to contains OFR.tl OFR 
repetitive element ; 


Homo sapiens gamma-aminobutyric acid (GABA) A receptor, gamma 2 (GABRG2) mRNA | 
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Top Hit Descriptor 


Homo sapiens protein tyrosine phosphatase, receptor type, A (PTPRA), mRNA j 


Homo sapiens protein tyrosine phosphatase, receptor type, A (PTPRA), mRNA j 


Homo sapiens probable mannose binding C-type lectin DC-SIGNR mRNA, complete cds ( 


Homo sapiens mRNA for KIAA01 45 protein, partial cds | 


Homo sapiens similar to rat myomegalin (L0C64182), mRNA 


Homo sapiens similar to rat myomegalin (LOC64182), mRNA j 


Homo sapiens meningioma (disrupted In balanced translocation) 1 (MN1), mRNA j 


Homo sapiens mRNA for KIAA0833 protein, partial cds j 
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601111970F1 N1H_MGC_16 Homo sapiens cDNA clone IMAGE:3352840 5' 


Homo sapiens ATP-binding cassette, sub-family A (ABC1 ), member 3 (ABCA3), mRNA j 


Homo sapiens chromosome 1 p33-p34 beta-1 ,4-galactosyltransferase mRNA, complete cds | 


EST69129 FetaJ lung II Homo sapiens cDNA 5" end | 


601312522F1 NIH_MGC_44 Homo sapiens cDNA clone IMAGE:3659284 5* | 


6021S3668F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4294601 5* j 


602153666F1 NIH_MGC_83 Homo sapiens cDNA clone IMAGE:4294601 5' | 


601125505F1 NIH_MGC_8 Homo sapiens cDNA clone IMAGE:3345480 5' \ 


Homo sapiens mRNA for KIAA0454 protein, partial cds j 


Homo sapiens mRNA for KIAA0454 protein, partial cds | 


Human transforming growth factor-beta (tgf-beta) mRNA, complete cds j 


Human transforming growth factor-beta (tgf-beta) mRNA, complete cd s j 


Homo sapiens hypothetical protein (FLJ1 1 045), mRNA | 


Homo sapiens armadillo repeat gene deletes in velocardlofaoia! syndrome (ARVCF), mRNA j 
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hn98d02.x1 NCI_CGAP_Co14 Homo sapiens cDNA clone IMAGE:3035907 3' similar to SW:COPG_BOVIN 
P53620 COATOMER GAMMA SUBUNIT ; 
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Homo sapiens chcndroiUn sulfate proteoglycan 4 (melanoma-associated) (CSPG4), mRNA | 


Homo sapiens angiopdetin-3 (ANG-3), mRNA | 
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Homo sapiens gammma-cytoplasmic actin (ACTGP3) pseudogene 1 


zn03g10.M Stratagene hNT neuron (#937233) Homo sapiens cDNA clone IMAGE:546402 5' [j 


zn03g10.r1 Stratagene hNT neuron (#937233) Homo sapiens cDNA clone IMAGE:546402 5' 


Homo sapiens titin (TTN) mRNA | 


Homo sapiens titin (TTN) mRNA | 


Bacillus amyloliquefaciens sacB gene for levansucrase (EC 2.4.1 .1 0) f 


Homo sapiens vascular endothelial cadherin 2 mRNA, complete cds ! 


Homo sapiens vascular endothelial cadherin 2 mRNA, complete cds j 


Homo sapiens ecotropic viral integration site 2A (EVI2A), mRNA j 


Homo sapiens ecotropic viral integration 3ite 2A (EVI2A), mRNA | 
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Homo sapiens HEF like Protein (HEFL), mRNA | 


Homo sapiens PR domain containing 1 , with ZNF domain (PRDM1 ) mRNA j 


Homo sapiens putative GPR37 gene, exon 2 j 


Homo sapiens putative GPR37 gene, exon 2 | 


Human endogenous retrovirus-K, LTR U5 and gag gene | 


Homo sapiens potassium inwardly-rectifying channel, subfamily J, member 16 (KCNJ16), mRNA | 


Homo saptens potassium inwardiy-rectlfylno channel, subfamily J. member 16 (KCNJ16), mRNA j 


Homo sapiens 4F2 light chain (LOC51 597), mRNA L 


Homo sapiens 4F2 light chain (LOC51 597), mRNA | j 


Homo sapiens deleted In bladder cancer chromosome region candidate 1 (DBCCR1), mRNA [: 


Homo sapiens mRNA for KIAA0559 protein, partial cds 


Mus musculus mRNA for leucine-rich repeat protein, partial cds . | 


Rattus norvegicus multidomain presynaptic cytomatrix protein Piccolo mRNA, complete cds, long splice f 
variant T| 


Rattus norvegicus multidomain presynaptic cytomatrix protein Piccolo mRNA. complete cds, long splice 
variant K 


Homo sapiens toll-like receptor 7 (TLR7) mRNA, complete cds |* 
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Homo sapiens glutamate receptor, ionotropic, N-methy! D-aspartate 2A (GRIN2A) mRNA j 


Homo sapiens glutamate receptor, Ionotropic, N-methy! D-aspartate 2A (GRIN2A) mRNA | £ 


EST367889 MAGE resequences, MAGD Homo sapiens cDNA | i 


Homo sapiens mRNA for KIAA1 513 protein, partial cds | ? 


Homo sapiens gababri receptor gene, exon 6 Jf 
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